Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] a saner API for allocating DMA addressable pages
Hi all,
this series replaced the DMA_ATTR_NON_CONSISTENT flag to dma_alloc_attrs
with a separate new dma_alloc_pages API, which is available on all
platforms.  In addition to cleaning up the convoluted code path, this
ensures that other drivers that have asked for better support for
non-coherent DMA to pages with incurring bounce buffering over can finally
be properly supported.
I'm still a little unsure about the API naming, as alloc_pages sort of
implies a struct page return value, but we return a kernel virtual
address.  The other alternative would be to name the API
dma_alloc_noncoherent, but the whole non-coherent naming seems to put
people off.  As a follow up I plan to move the implementation of the
DMA_ATTR_NO_KERNEL_MAPPING flag over to this framework as well, given
that is also is a fundamentally non coherent allocation.  The replacement
for that flag would then return a struct page, as it is allowed to
actually return pages without a kernel mapping as the name suggested
(although most of the time they will actually have a kernel mapping..)
In addition to the conversions of the existing non-coherent DMA users
the last three patches also convert the DMA coherent allocations in
the NVMe driver to use this new framework through a dmapool addition.
This was both to give me a good testing vehicle, but also because it
should speed up the NVMe driver on platforms with non-coherent DMA
nicely, without a downside on platforms with cache coherent DMA.
A git tree is available here:
    git://git.infradead.org/users/hch/misc.git dma_alloc_pages
Gitweb:
   
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages
Diffstat:
 Documentation/core-api/dma-api.rst                       |   92 ++----
 Documentation/core-api/dma-attributes.rst                |    8 
 Documentation/userspace-api/media/v4l/buffer.rst         |   17 -
 Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst |    1 
 arch/alpha/kernel/pci_iommu.c                            |    2 
 arch/arm/include/asm/dma-direct.h                        |    4 
 arch/arm/mm/dma-mapping-nommu.c                          |    2 
 arch/arm/mm/dma-mapping.c                                |    4 
 arch/ia64/Kconfig                                        |    3 
 arch/ia64/hp/common/sba_iommu.c                          |    2 
 arch/ia64/kernel/dma-mapping.c                           |   14 
 arch/ia64/mm/init.c                                      |    3 
 arch/mips/Kconfig                                        |    1 
 arch/mips/bmips/dma.c                                    |    4 
 arch/mips/cavium-octeon/dma-octeon.c                     |    4 
 arch/mips/include/asm/dma-direct.h                       |    4 
 arch/mips/include/asm/jazzdma.h                          |    2 
 arch/mips/jazz/jazzdma.c                                 |  102 +------
 arch/mips/loongson2ef/fuloong-2e/dma.c                   |    4 
 arch/mips/loongson2ef/lemote-2f/dma.c                    |    4 
 arch/mips/loongson64/dma.c                               |    4 
 arch/mips/mm/dma-noncoherent.c                           |   48 +--
 arch/mips/pci/pci-ar2315.c                               |    4 
 arch/mips/pci/pci-xtalk-bridge.c                         |    4 
 arch/mips/sgi-ip32/ip32-dma.c                            |    4 
 arch/parisc/Kconfig                                      |    1 
 arch/parisc/kernel/pci-dma.c                             |    6 
 arch/powerpc/include/asm/dma-direct.h                    |    4 
 arch/powerpc/kernel/dma-iommu.c                          |    2 
 arch/powerpc/platforms/ps3/system-bus.c                  |    4 
 arch/powerpc/platforms/pseries/vio.c                     |    2 
 arch/s390/pci/pci_dma.c                                  |    2 
 arch/x86/kernel/amd_gart_64.c                            |    8 
 drivers/gpu/drm/exynos/exynos_drm_gem.c                  |    2 
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c      |    3 
 drivers/iommu/dma-iommu.c                                |    2 
 drivers/iommu/intel/iommu.c                              |    6 
 drivers/media/common/videobuf2/videobuf2-core.c          |   36 --
 drivers/media/common/videobuf2/videobuf2-dma-contig.c    |   19 -
 drivers/media/common/videobuf2/videobuf2-dma-sg.c        |    3 
 drivers/media/common/videobuf2/videobuf2-v4l2.c          |   12 
 drivers/net/ethernet/amd/au1000_eth.c                    |   15 -
 drivers/net/ethernet/i825xx/lasi_82596.c                 |   36 +-
 drivers/net/ethernet/i825xx/lib82596.c                   |  148 +++++-----
 drivers/net/ethernet/i825xx/sni_82596.c                  |   23 -
 drivers/net/ethernet/seeq/sgiseeq.c                      |   24 -
 drivers/nvme/host/pci.c                                  |   79 ++---
 drivers/parisc/ccio-dma.c                                |    2 
 drivers/parisc/sba_iommu.c                               |    2 
 drivers/scsi/53c700.c                                    |  120 ++++----
 drivers/scsi/53c700.h                                    |    9 
 drivers/scsi/sgiwd93.c                                   |   14 
 drivers/xen/swiotlb-xen.c                                |    2 
 include/linux/dma-direct.h                               |   55 ++-
 include/linux/dma-mapping.h                              |   32 +-
 include/linux/dma-noncoherent.h                          |   21 -
 include/linux/dmapool.h                                  |   23 +
 include/linux/gfp.h                                      |    6 
 include/media/videobuf2-core.h                           |    3 
 include/uapi/linux/videodev2.h                           |    2 
 kernel/dma/Kconfig                                       |    9 
 kernel/dma/Makefile                                      |    1 
 kernel/dma/coherent.c                                    |   17 +
 kernel/dma/direct.c                                      |  112 +++++--
 kernel/dma/mapping.c                                     |  104 ++-----
 kernel/dma/ops_helpers.c                                 |   86 ++++++
 kernel/dma/pool.c                                        |    2 
 kernel/dma/swiotlb.c                                     |    4 
 kernel/dma/virt.c                                        |    2 
 mm/dmapool.c                                             |  211 +++++++++------
 sound/mips/hal2.c                                        |   58 +---
 71 files changed, 872 insertions(+), 803 deletions(-)
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 01/28] mm: turn alloc_pages into an inline function
To prevent a compiler error when a method call alloc_pages is
added (which I plan to for the dma_map_ops).
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 include/linux/gfp.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 67a0774e080b98..dd2577c5407112 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -550,8 +550,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int
order,
 #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
 	alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
 #else
-#define alloc_pages(gfp_mask, order) \
-		alloc_pages_node(numa_node_id(), gfp_mask, order)
+static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
+{
+	return alloc_pages_node(numa_node_id(), gfp_mask, order);
+}
 #define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
 	alloc_pages(gfp_mask, order)
 #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 02/28] drm/exynos: stop setting DMA_ATTR_NON_CONSISTENT
DMA_ATTR_NON_CONSISTENT is a no-op except on PARISC and some mips configs, so don't set it in this ARM specific driver. Signed-off-by: Christoph Hellwig <hch at lst.de> --- drivers/gpu/drm/exynos/exynos_drm_gem.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c index efa476858db54b..07073222b8f691 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c @@ -42,8 +42,6 @@ static int exynos_drm_alloc_buf(struct exynos_drm_gem *exynos_gem, bool kvmap) if (exynos_gem->flags & EXYNOS_BO_WC || !(exynos_gem->flags & EXYNOS_BO_CACHABLE)) attr |= DMA_ATTR_WRITE_COMBINE; - else - attr |= DMA_ATTR_NON_CONSISTENT; /* FBDev emulation requires kernel mapping */ if (!kvmap) -- 2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 03/28] drm/nouveau/gk20a: stop setting DMA_ATTR_NON_CONSISTENT
DMA_ATTR_NON_CONSISTENT is a no-op except on PARISC and some mips
configs, so don't set it in this ARM specific driver part.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
index 985f2990ab0dda..13d4d7ac0697b4 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -594,8 +594,7 @@ gk20a_instmem_new(struct nvkm_device *device, int index,
 
 		nvkm_info(&imem->base.subdev, "using IOMMU\n");
 	} else {
-		imem->attrs = DMA_ATTR_NON_CONSISTENT |
-			      DMA_ATTR_WEAK_ORDERING |
+		imem->attrs = DMA_ATTR_WEAK_ORDERING |
 			      DMA_ATTR_WRITE_COMBINE;
 
 		nvkm_info(&imem->base.subdev, "using DMA API\n");
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 04/28] net/au1000-eth: stop using DMA_ATTR_NON_CONSISTENT
The au1000-eth driver contains none of the manual cache synchronization
required for using DMA_ATTR_NON_CONSISTENT.  From what I can tell it
can be used on both dma coherent and non-coherent DMA platforms, but
I suspect it has been buggy on the non-coherent platforms all along.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/net/ethernet/amd/au1000_eth.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/amd/au1000_eth.c
b/drivers/net/ethernet/amd/au1000_eth.c
index 75dbd221dc594b..19e195420e2434 100644
--- a/drivers/net/ethernet/amd/au1000_eth.c
+++ b/drivers/net/ethernet/amd/au1000_eth.c
@@ -1131,10 +1131,9 @@ static int au1000_probe(struct platform_device *pdev)
 	/* Allocate the data buffers
 	 * Snooping works fine with eth on all au1xxx
 	 */
-	aup->vaddr = (u32)dma_alloc_attrs(&pdev->dev, MAX_BUF_SIZE *
+	aup->vaddr = (u32)dma_alloc_coherent(&pdev->dev, MAX_BUF_SIZE *
 					  (NUM_TX_BUFFS + NUM_RX_BUFFS),
-					  &aup->dma_addr, 0,
-					  DMA_ATTR_NON_CONSISTENT);
+					  &aup->dma_addr, 0);
 	if (!aup->vaddr) {
 		dev_err(&pdev->dev, "failed to allocate data buffers\n");
 		err = -ENOMEM;
@@ -1310,9 +1309,8 @@ static int au1000_probe(struct platform_device *pdev)
 err_remap2:
 	iounmap(aup->mac);
 err_remap1:
-	dma_free_attrs(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS +
NUM_RX_BUFFS),
-			(void *)aup->vaddr, aup->dma_addr,
-			DMA_ATTR_NON_CONSISTENT);
+	dma_free_coherent(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS +
NUM_RX_BUFFS),
+			(void *)aup->vaddr, aup->dma_addr);
 err_vaddr:
 	free_netdev(dev);
 err_alloc:
@@ -1344,9 +1342,8 @@ static int au1000_remove(struct platform_device *pdev)
 		if (aup->tx_db_inuse[i])
 			au1000_ReleaseDB(aup, aup->tx_db_inuse[i]);
 
-	dma_free_attrs(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS +
NUM_RX_BUFFS),
-			(void *)aup->vaddr, aup->dma_addr,
-			DMA_ATTR_NON_CONSISTENT);
+	dma_free_coherent(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS +
NUM_RX_BUFFS),
+			(void *)aup->vaddr, aup->dma_addr);
 
 	iounmap(aup->macdma);
 	iounmap(aup->mac);
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, and causes
weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
unimplemented except on PARISC and some MIPS configs, and about to be
removed.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 .../userspace-api/media/v4l/buffer.rst        | 17 ---------
 .../media/v4l/vidioc-reqbufs.rst              |  1 -
 .../media/common/videobuf2/videobuf2-core.c   | 36 +------------------
 .../common/videobuf2/videobuf2-dma-contig.c   | 19 ----------
 .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
 .../media/common/videobuf2/videobuf2-v4l2.c   | 12 -------
 include/media/videobuf2-core.h                |  3 +-
 include/uapi/linux/videodev2.h                |  2 --
 8 files changed, 3 insertions(+), 90 deletions(-)
diff --git a/Documentation/userspace-api/media/v4l/buffer.rst
b/Documentation/userspace-api/media/v4l/buffer.rst
index 57e752aaf414a7..2044ed13cd9d7d 100644
--- a/Documentation/userspace-api/media/v4l/buffer.rst
+++ b/Documentation/userspace-api/media/v4l/buffer.rst
@@ -701,23 +701,6 @@ Memory Consistency Flags
     :stub-columns: 0
     :widths:       3 1 4
 
-    * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`:
-
-      - ``V4L2_FLAG_MEMORY_NON_CONSISTENT``
-      - 0x00000001
-      - A buffer is allocated either in consistent (it will be automatically
-	coherent between the CPU and the bus) or non-consistent memory. The
-	latter can provide performance gains, for instance the CPU cache
-	sync/flush operations can be avoided if the buffer is accessed by the
-	corresponding device only and the CPU does not read/write to/from that
-	buffer. However, this requires extra care from the driver -- it must
-	guarantee memory consistency by issuing a cache flush/sync when
-	consistency is needed. If this flag is set V4L2 will attempt to
-	allocate the buffer in non-consistent memory. The flag takes effect
-	only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the
-	queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
-	<V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability.
-
 .. c:type:: v4l2_memory
 
 enum v4l2_memory
diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
index 75d894d9c36c42..3180c111d368ee 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
@@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit
       - This capability is set by the driver to indicate that the queue
supports
         cache and memory management hints. However, it's only valid when
the
         queue is used for :ref:`memory mapping <mmap>` streaming I/O. See
-        :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT
<V4L2-FLAG-MEMORY-NON-CONSISTENT>`,
         :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE
<V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and
         :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN
<V4L2-BUF-FLAG-NO-CACHE-CLEAN>`.
 
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c
b/drivers/media/common/videobuf2/videobuf2-core.c
index f544d3393e9d6b..66a41cef33c1b1 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q,
 }
 EXPORT_SYMBOL(vb2_verify_memory_type);
 
-static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem)
-{
-	q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT;
-
-	if (!vb2_queue_allows_cache_hints(q))
-		return;
-	if (!consistent_mem)
-		q->dma_attrs |= DMA_ATTR_NON_CONSISTENT;
-}
-
-static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem)
-{
-	bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT);
-
-	if (consistent_mem != queue_is_consistent) {
-		dprintk(q, 1, "memory consistency model mismatch\n");
-		return false;
-	}
-	return true;
-}
-
 int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
 		     unsigned int flags, unsigned int *count)
 {
 	unsigned int num_buffers, allocated_buffers, num_planes = 0;
 	unsigned plane_sizes[VB2_MAX_PLANES] = { };
-	bool consistent_mem = true;
 	unsigned int i;
 	int ret;
 
-	if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
-		consistent_mem = false;
-
 	if (q->streaming) {
 		dprintk(q, 1, "streaming active\n");
 		return -EBUSY;
@@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory
memory,
 	}
 
 	if (*count == 0 || q->num_buffers != 0 ||
-	    (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) ||
-	    !verify_consistency_attr(q, consistent_mem)) {
+	    (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) {
 		/*
 		 * We already have buffers allocated, so first check if they
 		 * are not in use and can be freed.
@@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory
memory,
 	num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME);
 	memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
 	q->memory = memory;
-	set_queue_consistency(q, consistent_mem);
 
 	/*
 	 * Ask the driver how many buffers and planes per buffer it requires.
@@ -894,12 +867,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum
vb2_memory memory,
 {
 	unsigned int num_planes = 0, num_buffers, allocated_buffers;
 	unsigned plane_sizes[VB2_MAX_PLANES] = { };
-	bool consistent_mem = true;
 	int ret;
 
-	if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
-		consistent_mem = false;
-
 	if (q->num_buffers == VB2_MAX_FRAME) {
 		dprintk(q, 1, "maximum number of buffers already allocated\n");
 		return -ENOBUFS;
@@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum
vb2_memory memory,
 		}
 		memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
 		q->memory = memory;
-		set_queue_consistency(q, consistent_mem);
 		q->waiting_for_buffers = !q->is_output;
 	} else {
 		if (q->memory != memory) {
 			dprintk(q, 1, "memory model mismatch\n");
 			return -EINVAL;
 		}
-		if (!verify_consistency_attr(q, consistent_mem))
-			return -EINVAL;
 	}
 
 	num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers);
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index ec3446cc45b8da..7b1b86ec942d7d 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -42,11 +42,6 @@ struct vb2_dc_buf {
 	struct dma_buf_attachment	*db_attach;
 };
 
-static inline bool vb2_dc_buffer_consistent(unsigned long attr)
-{
-	return !(attr & DMA_ATTR_NON_CONSISTENT);
-}
-
 /*********************************************/
 /*        scatterlist table functions        */
 /*********************************************/
@@ -341,13 +336,6 @@ static int
 vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
 				   enum dma_data_direction direction)
 {
-	struct vb2_dc_buf *buf = dbuf->priv;
-	struct sg_table *sgt = buf->dma_sgt;
-
-	if (vb2_dc_buffer_consistent(buf->attrs))
-		return 0;
-
-	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
 	return 0;
 }
 
@@ -355,13 +343,6 @@ static int
 vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
 				 enum dma_data_direction direction)
 {
-	struct vb2_dc_buf *buf = dbuf->priv;
-	struct sg_table *sgt = buf->dma_sgt;
-
-	if (vb2_dc_buffer_consistent(buf->attrs))
-		return 0;
-
-	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents,
buf->dma_dir);
 	return 0;
 }
 
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned
long dma_attrs,
 	/*
 	 * NOTE: dma-sg allocates memory using the page allocator directly, so
 	 * there is no memory consistency guarantee, hence dma-sg ignores DMA
-	 * attributes passed from the upper layer. That means that
-	 * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers.
+	 * attributes passed from the upper layer.
 	 */
 	buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *),
 				    GFP_KERNEL | __GFP_ZERO);
diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c
b/drivers/media/common/videobuf2/videobuf2-v4l2.c
index 30caad27281e1a..de83ad48783821 100644
--- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
+++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
@@ -722,20 +722,11 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps)
 #endif
 }
 
-static void clear_consistency_attr(struct vb2_queue *q,
-				   int memory,
-				   unsigned int *flags)
-{
-	if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP)
-		*flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT;
-}
-
 int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req)
 {
 	int ret = vb2_verify_memory_type(q, req->memory, req->type);
 
 	fill_buf_caps(q, &req->capabilities);
-	clear_consistency_attr(q, req->memory, &req->flags);
 	return ret ? ret : vb2_core_reqbufs(q, req->memory,
 					    req->flags, &req->count);
 }
@@ -769,7 +760,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct
v4l2_create_buffers *create)
 	unsigned i;
 
 	fill_buf_caps(q, &create->capabilities);
-	clear_consistency_attr(q, create->memory, &create->flags);
 	create->index = q->num_buffers;
 	if (create->count == 0)
 		return ret != -EBUSY ? ret : 0;
@@ -998,7 +988,6 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv,
 	int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type);
 
 	fill_buf_caps(vdev->queue, &p->capabilities);
-	clear_consistency_attr(vdev->queue, p->memory, &p->flags);
 	if (res)
 		return res;
 	if (vb2_queue_is_busy(vdev, file))
@@ -1021,7 +1010,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv,
 
 	p->index = vdev->queue->num_buffers;
 	fill_buf_caps(vdev->queue, &p->capabilities);
-	clear_consistency_attr(vdev->queue, p->memory, &p->flags);
 	/*
 	 * If count == 0, then just check if memory and type are valid.
 	 * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0.
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index 52ef92049073e3..4c7f25b07e9375 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -744,8 +744,7 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int
index, void *pb);
  * vb2_core_reqbufs() - Initiate streaming.
  * @q:		pointer to &struct vb2_queue with videobuf2 queue.
  * @memory:	memory type, as defined by &enum vb2_memory.
- * @flags:	auxiliary queue/buffer management flags. Currently, the only
- *		used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT.
+ * @flags:	auxiliary queue/buffer management flags.
  * @count:	requested buffer count.
  *
  * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index c7b70ff53bc1dd..5c00f63d9c1b58 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -191,8 +191,6 @@ enum v4l2_memory {
 	V4L2_MEMORY_DMABUF           = 4,
 };
 
-#define V4L2_FLAG_MEMORY_NON_CONSISTENT		(1 << 0)
-
 /* see also http://vektor.theorem.ca/graphics/ycbcr/ */
 enum v4l2_colorspace {
 	/*
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 06/28] lib82596: move DMA allocation into the callers of i82596_probe
This allows us to get rid of the LIB82596_DMA_ATTR defined and prepare
for untangling the coherent vs non-coherent DMA allocation API.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/net/ethernet/i825xx/lasi_82596.c | 24 ++++++++++------
 drivers/net/ethernet/i825xx/lib82596.c   | 36 ++++++++----------------
 drivers/net/ethernet/i825xx/sni_82596.c  | 19 +++++++++----
 3 files changed, 40 insertions(+), 39 deletions(-)
diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c
b/drivers/net/ethernet/i825xx/lasi_82596.c
index aec7e98bcc853a..8c5ab9b7553e75 100644
--- a/drivers/net/ethernet/i825xx/lasi_82596.c
+++ b/drivers/net/ethernet/i825xx/lasi_82596.c
@@ -96,8 +96,6 @@
 
 #define OPT_SWAP_PORT	0x0001	/* Need to wordswp on the MPU port */
 
-#define LIB82596_DMA_ATTR	DMA_ATTR_NON_CONSISTENT
-
 #define DMA_WBACK(ndev, addr, len) \
 	do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_TO_DEVICE);
} while (0)
 
@@ -155,7 +153,7 @@ lan_init_chip(struct parisc_device *dev)
 {
 	struct	net_device *netdevice;
 	struct i596_private *lp;
-	int	retval;
+	int retval = -ENOMEM;
 	int i;
 
 	if (!dev->irq) {
@@ -186,12 +184,22 @@ lan_init_chip(struct parisc_device *dev)
 
 	lp = netdev_priv(netdevice);
 	lp->options = dev->id.sversion == 0x72 ? OPT_SWAP_PORT : 0;
+	lp->dma = dma_alloc_attrs(dev->dev.parent, sizeof(struct i596_dma),
+			      &lp->dma_addr, GFP_KERNEL,
+			      DMA_ATTR_NON_CONSISTENT);
+	if (!lp->dma)
+		goto out_free_netdev;
 
 	retval = i82596_probe(netdevice);
-	if (retval) {
-		free_netdev(netdevice);
-		return -ENODEV;
-	}
+	if (retval)
+		goto out_free_dma;
+	return 0;
+
+out_free_dma:
+	dma_free_attrs(dev->dev.parent, sizeof(struct i596_dma),
+		       lp->dma, lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+out_free_netdev:
+	free_netdev(netdevice);
 	return retval;
 }
 
@@ -202,7 +210,7 @@ static int __exit lan_remove_chip(struct parisc_device
*pdev)
 
 	unregister_netdev (dev);
 	dma_free_attrs(&pdev->dev, sizeof(struct i596_private), lp->dma,
-		       lp->dma_addr, LIB82596_DMA_ATTR);
+		       lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
 	free_netdev (dev);
 	return 0;
 }
diff --git a/drivers/net/ethernet/i825xx/lib82596.c
b/drivers/net/ethernet/i825xx/lib82596.c
index b03757e169e475..b4e4b3eb5758b5 100644
--- a/drivers/net/ethernet/i825xx/lib82596.c
+++ b/drivers/net/ethernet/i825xx/lib82596.c
@@ -1047,9 +1047,8 @@ static const struct net_device_ops i596_netdev_ops = {
 
 static int i82596_probe(struct net_device *dev)
 {
-	int i;
 	struct i596_private *lp = netdev_priv(dev);
-	struct i596_dma *dma;
+	int ret;
 
 	/* This lot is ensure things have been cache line aligned. */
 	BUILD_BUG_ON(sizeof(struct i596_rfd) != 32);
@@ -1063,41 +1062,28 @@ static int i82596_probe(struct net_device *dev)
 	if (!dev->base_addr || !dev->irq)
 		return -ENODEV;
 
-	dma = dma_alloc_attrs(dev->dev.parent, sizeof(struct i596_dma),
-			      &lp->dma_addr, GFP_KERNEL,
-			      LIB82596_DMA_ATTR);
-	if (!dma) {
-		printk(KERN_ERR "%s: Couldn't get shared memory\n", __FILE__);
-		return -ENOMEM;
-	}
-
 	dev->netdev_ops = &i596_netdev_ops;
 	dev->watchdog_timeo = TX_TIMEOUT;
 
-	memset(dma, 0, sizeof(struct i596_dma));
-	lp->dma = dma;
-
-	dma->scb.command = 0;
-	dma->scb.cmd = I596_NULL;
-	dma->scb.rfd = I596_NULL;
+	memset(lp->dma, 0, sizeof(struct i596_dma));
+	lp->dma->scb.command = 0;
+	lp->dma->scb.cmd = I596_NULL;
+	lp->dma->scb.rfd = I596_NULL;
 	spin_lock_init(&lp->lock);
 
-	DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma));
+	DMA_WBACK_INV(dev, lp->dma, sizeof(struct i596_dma));
 
-	i = register_netdev(dev);
-	if (i) {
-		dma_free_attrs(dev->dev.parent, sizeof(struct i596_dma),
-			       dma, lp->dma_addr, LIB82596_DMA_ATTR);
-		return i;
-	}
+	ret = register_netdev(dev);
+	if (ret)
+		return ret;
 
 	DEB(DEB_PROBE, printk(KERN_INFO "%s: 82596 at %#3lx, %pM IRQ %d.\n",
 			      dev->name, dev->base_addr, dev->dev_addr,
 			      dev->irq));
 	DEB(DEB_INIT, printk(KERN_INFO
 			     "%s: dma at 0x%p (%d bytes), lp->scb at 0x%p\n",
-			     dev->name, dma, (int)sizeof(struct i596_dma),
-			     &dma->scb));
+			     dev->name, lp->dma, (int)sizeof(struct i596_dma),
+			     &lp->dma->scb));
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/i825xx/sni_82596.c
b/drivers/net/ethernet/i825xx/sni_82596.c
index 22f5887578b2bd..e80e790ffbd4d4 100644
--- a/drivers/net/ethernet/i825xx/sni_82596.c
+++ b/drivers/net/ethernet/i825xx/sni_82596.c
@@ -24,8 +24,6 @@
 
 static const char sni_82596_string[] = "snirm_82596";
 
-#define LIB82596_DMA_ATTR	0
-
 #define DMA_WBACK(priv, addr, len)     do { } while (0)
 #define DMA_INV(priv, addr, len)       do { } while (0)
 #define DMA_WBACK_INV(priv, addr, len) do { } while (0)
@@ -134,10 +132,19 @@ static int sni_82596_probe(struct platform_device *dev)
 	lp->ca = ca_addr;
 	lp->mpu_port = mpu_addr;
 
+	lp->dma = dma_alloc_coherent(dev->dev.parent, sizeof(struct i596_dma),
+				     &lp->dma_addr, GFP_KERNEL);
+	if (!lp->dma)
+		goto probe_failed;
+
 	retval = i82596_probe(netdevice);
-	if (retval == 0)
-		return 0;
+	if (retval)
+		goto probe_failed_free_dma;
+	return 0;
 
+probe_failed_free_dma:
+	dma_free_coherent(dev->dev.parent, sizeof(struct i596_dma), lp->dma,
+			  lp->dma_addr);
 probe_failed:
 	free_netdev(netdevice);
 probe_failed_free_ca:
@@ -153,8 +160,8 @@ static int sni_82596_driver_remove(struct platform_device
*pdev)
 	struct i596_private *lp = netdev_priv(dev);
 
 	unregister_netdev(dev);
-	dma_free_attrs(dev->dev.parent, sizeof(struct i596_private), lp->dma,
-		       lp->dma_addr, LIB82596_DMA_ATTR);
+	dma_free_coherent(dev->dev.parent, sizeof(struct i596_private), lp->dma,
+			  lp->dma_addr);
 	iounmap(lp->ca);
 	iounmap(lp->mpu_port);
 	free_netdev (dev);
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 07/28] 53c700: improve non-coherent DMA handling
Switch the 53c700 driver to only use non-coherent descriptor memory if it
really has to because dma_alloc_coherent fails.  This doesn't matter for
any of the platforms it runs on currently, but that will change soon.
To help with this two new helpers to transfer ownership to and from the
device are added that abstract the syncing of the non-coherent memory.
The two current bidirectional cases are mapped to transfers to the
device, as that appears to what they are used for.  Note that for parisc,
which is the only architecture this driver needs to use non-coherent
memory on, the direction argument of dma_cache_sync is ignored, so this
will not change behavior in any way.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/scsi/53c700.c | 113 +++++++++++++++++++++++-------------------
 drivers/scsi/53c700.h |   9 ++--
 2 files changed, 68 insertions(+), 54 deletions(-)
diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index 461b3babb601ef..b197ed9399e2e0 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -269,6 +269,20 @@ NCR_700_get_SXFER(struct scsi_device *SDp)
 					      spi_period(SDp->sdev_target));
 }
 
+static inline void dma_sync_to_dev(struct NCR_700_Host_Parameters *h,
+		void *addr, size_t size)
+{
+	if (h->noncoherent)
+		dma_cache_sync(h->dev, addr, size, DMA_TO_DEVICE);
+}
+
+static inline void dma_sync_from_dev(struct NCR_700_Host_Parameters *h,
+		void *addr, size_t size)
+{
+	if (h->noncoherent)
+		dma_cache_sync(h->dev, addr, size, DMA_FROM_DEVICE);
+}
+
 struct Scsi_Host *
 NCR_700_detect(struct scsi_host_template *tpnt,
 	       struct NCR_700_Host_Parameters *hostdata, struct device *dev)
@@ -283,9 +297,13 @@ NCR_700_detect(struct scsi_host_template *tpnt,
 	if(tpnt->sdev_attrs == NULL)
 		tpnt->sdev_attrs = NCR_700_dev_attrs;
 
-	memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript,
-				 GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
-	if(memory == NULL) {
+	memory = dma_alloc_coherent(dev, TOTAL_MEM_SIZE, &pScript, GFP_KERNEL);
+	if (!memory) {
+		hostdata->noncoherent = 1;
+		memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript,
+					 GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+	}
+	if (!memory) {
 		printk(KERN_ERR "53c700: Failed to allocate memory for driver,
detaching\n");
 		return NULL;
 	}
@@ -339,11 +357,11 @@ NCR_700_detect(struct scsi_host_template *tpnt,
 	for (j = 0; j < PATCHES; j++)
 		script[LABELPATCHES[j]] = bS_to_host(pScript + SCRIPT[LABELPATCHES[j]]);
 	/* now patch up fixed addresses. */
-	script_patch_32(hostdata->dev, script, MessageLocation,
+	script_patch_32(hostdata, script, MessageLocation,
 			pScript + MSGOUT_OFFSET);
-	script_patch_32(hostdata->dev, script, StatusAddress,
+	script_patch_32(hostdata, script, StatusAddress,
 			pScript + STATUS_OFFSET);
-	script_patch_32(hostdata->dev, script, ReceiveMsgAddress,
+	script_patch_32(hostdata, script, ReceiveMsgAddress,
 			pScript + MSGIN_OFFSET);
 
 	hostdata->script = script;
@@ -395,8 +413,12 @@ NCR_700_release(struct Scsi_Host *host)
 	struct NCR_700_Host_Parameters *hostdata = 
 		(struct NCR_700_Host_Parameters *)host->hostdata[0];
 
-	dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
-		       hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+	if (hostdata->noncoherent)
+		dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
+			       hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+	else
+		dma_free_coherent(hostdata->dev, TOTAL_MEM_SIZE,
+				  hostdata->script, hostdata->pScript);
 	return 1;
 }
 
@@ -804,8 +826,8 @@ process_extended_message(struct Scsi_Host *host,
 			shost_printk(KERN_WARNING, host,
 				"Unexpected SDTR msg\n");
 			hostdata->msgout[0] = A_REJECT_MSG;
-			dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE);
-			script_patch_16(hostdata->dev, hostdata->script,
+			dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+			script_patch_16(hostdata, hostdata->script,
 			                MessageCount, 1);
 			/* SendMsgOut returns, so set up the return
 			 * address */
@@ -817,9 +839,8 @@ process_extended_message(struct Scsi_Host *host,
 		printk(KERN_INFO "scsi%d: (%d:%d), Unsolicited WDTR after CMD,
Rejecting\n",
 		       host->host_no, pun, lun);
 		hostdata->msgout[0] = A_REJECT_MSG;
-		dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE);
-		script_patch_16(hostdata->dev, hostdata->script, MessageCount,
-		                1);
+		dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+		script_patch_16(hostdata, hostdata->script, MessageCount, 1);
 		resume_offset = hostdata->pScript + Ent_SendMessageWithATN;
 
 		break;
@@ -832,9 +853,8 @@ process_extended_message(struct Scsi_Host *host,
 		printk("\n");
 		/* just reject it */
 		hostdata->msgout[0] = A_REJECT_MSG;
-		dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE);
-		script_patch_16(hostdata->dev, hostdata->script, MessageCount,
-		                1);
+		dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+		script_patch_16(hostdata, hostdata->script, MessageCount, 1);
 		/* SendMsgOut returns, so set up the return
 		 * address */
 		resume_offset = hostdata->pScript + Ent_SendMessageWithATN;
@@ -917,9 +937,8 @@ process_message(struct Scsi_Host *host,	struct
NCR_700_Host_Parameters *hostdata
 		printk("\n");
 		/* just reject it */
 		hostdata->msgout[0] = A_REJECT_MSG;
-		dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE);
-		script_patch_16(hostdata->dev, hostdata->script, MessageCount,
-		                1);
+		dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+		script_patch_16(hostdata, hostdata->script, MessageCount, 1);
 		/* SendMsgOut returns, so set up the return
 		 * address */
 		resume_offset = hostdata->pScript + Ent_SendMessageWithATN;
@@ -928,7 +947,7 @@ process_message(struct Scsi_Host *host,	struct
NCR_700_Host_Parameters *hostdata
 	}
 	NCR_700_writel(temp, host, TEMP_REG);
 	/* set us up to receive another message */
-	dma_cache_sync(hostdata->dev, hostdata->msgin, MSG_ARRAY_SIZE,
DMA_FROM_DEVICE);
+	dma_sync_from_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);
 	return resume_offset;
 }
 
@@ -1008,8 +1027,8 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct
scsi_cmnd *SCp,
 				slot->SG[1].ins = bS_to_host(SCRIPT_RETURN);
 				slot->SG[1].pAddr = 0;
 				slot->resume_offset = hostdata->pScript;
-				dma_cache_sync(hostdata->dev, slot->SG, sizeof(slot->SG[0])*2,
DMA_TO_DEVICE);
-				dma_cache_sync(hostdata->dev, SCp->sense_buffer,
SCSI_SENSE_BUFFERSIZE, DMA_FROM_DEVICE);
+				dma_sync_to_dev(hostdata, slot->SG, sizeof(slot->SG[0])*2);
+				dma_sync_from_dev(hostdata, SCp->sense_buffer, SCSI_SENSE_BUFFERSIZE);
 
 				/* queue the command for reissue */
 				slot->state = NCR_700_SLOT_QUEUED;
@@ -1129,11 +1148,11 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct
scsi_cmnd *SCp,
 			hostdata->cmd = slot->cmnd;
 
 			/* re-patch for this command */
-			script_patch_32_abs(hostdata->dev, hostdata->script,
+			script_patch_32_abs(hostdata, hostdata->script,
 			                    CommandAddress, slot->pCmd);
-			script_patch_16(hostdata->dev, hostdata->script,
+			script_patch_16(hostdata, hostdata->script,
 					CommandCount, slot->cmnd->cmd_len);
-			script_patch_32_abs(hostdata->dev, hostdata->script,
+			script_patch_32_abs(hostdata, hostdata->script,
 			                    SGScriptStartAddress,
 					    to32bit(&slot->pSG[0].ins));
 
@@ -1144,14 +1163,14 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct
scsi_cmnd *SCp,
 			 * should therefore always clear ACK */
 			NCR_700_writeb(NCR_700_get_SXFER(hostdata->cmd->device),
 				       host, SXFER_REG);
-			dma_cache_sync(hostdata->dev, hostdata->msgin,
-				       MSG_ARRAY_SIZE, DMA_FROM_DEVICE);
-			dma_cache_sync(hostdata->dev, hostdata->msgout,
-				       MSG_ARRAY_SIZE, DMA_TO_DEVICE);
+			dma_sync_from_dev(hostdata, hostdata->msgin,
+				       MSG_ARRAY_SIZE);
+			dma_sync_to_dev(hostdata, hostdata->msgout,
+				       MSG_ARRAY_SIZE);
 			/* I'm just being paranoid here, the command should
 			 * already have been flushed from the cache */
-			dma_cache_sync(hostdata->dev, slot->cmnd->cmnd,
-				       slot->cmnd->cmd_len, DMA_TO_DEVICE);
+			dma_sync_to_dev(hostdata, slot->cmnd->cmnd,
+				       slot->cmnd->cmd_len);
 
 
 			
@@ -1214,8 +1233,7 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct
scsi_cmnd *SCp,
 		hostdata->reselection_id = reselection_id;
 		/* just in case we have a stale simple tag message, clear it */
 		hostdata->msgin[1] = 0;
-		dma_cache_sync(hostdata->dev, hostdata->msgin,
-			       MSG_ARRAY_SIZE, DMA_BIDIRECTIONAL);
+		dma_sync_to_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);
 		if(hostdata->tag_negotiated & (1<<reselection_id)) {
 			resume_offset = hostdata->pScript + Ent_GetReselectionWithTag;
 		} else {
@@ -1329,8 +1347,7 @@ process_selection(struct Scsi_Host *host, __u32 dsp)
 	hostdata->cmd = NULL;
 	/* clear any stale simple tag message */
 	hostdata->msgin[1] = 0;
-	dma_cache_sync(hostdata->dev, hostdata->msgin, MSG_ARRAY_SIZE,
-		       DMA_BIDIRECTIONAL);
+	dma_sync_to_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);
 
 	if(id == 0xff) {
 		/* Selected as target, Ignore */
@@ -1427,30 +1444,26 @@ NCR_700_start_command(struct scsi_cmnd *SCp)
 		NCR_700_set_flag(SCp->device, NCR_700_DEV_BEGIN_SYNC_NEGOTIATION);
 	}
 
-	script_patch_16(hostdata->dev, hostdata->script, MessageCount, count);
-
+	script_patch_16(hostdata, hostdata->script, MessageCount, count);
 
-	script_patch_ID(hostdata->dev, hostdata->script,
-			Device_ID, 1<<scmd_id(SCp));
+	script_patch_ID(hostdata, hostdata->script, Device_ID,
1<<scmd_id(SCp));
 
-	script_patch_32_abs(hostdata->dev, hostdata->script, CommandAddress,
+	script_patch_32_abs(hostdata, hostdata->script, CommandAddress,
 			    slot->pCmd);
-	script_patch_16(hostdata->dev, hostdata->script, CommandCount,
-	                SCp->cmd_len);
+	script_patch_16(hostdata, hostdata->script, CommandCount, SCp->cmd_len);
 	/* finally plumb the beginning of the SG list into the script
 	 * */
-	script_patch_32_abs(hostdata->dev, hostdata->script,
+	script_patch_32_abs(hostdata, hostdata->script,
 	                    SGScriptStartAddress, to32bit(&slot->pSG[0].ins));
 	NCR_700_clear_fifo(SCp->device->host);
 
 	if(slot->resume_offset == 0)
 		slot->resume_offset = hostdata->pScript;
 	/* now perform all the writebacks and invalidates */
-	dma_cache_sync(hostdata->dev, hostdata->msgout, count, DMA_TO_DEVICE);
-	dma_cache_sync(hostdata->dev, hostdata->msgin, MSG_ARRAY_SIZE,
-		       DMA_FROM_DEVICE);
-	dma_cache_sync(hostdata->dev, SCp->cmnd, SCp->cmd_len,
DMA_TO_DEVICE);
-	dma_cache_sync(hostdata->dev, hostdata->status, 1, DMA_FROM_DEVICE);
+	dma_sync_to_dev(hostdata, hostdata->msgout, count);
+	dma_sync_from_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);
+	dma_sync_to_dev(hostdata, SCp->cmnd, SCp->cmd_len);
+	dma_sync_from_dev(hostdata, hostdata->status, 1);
 
 	/* set the synchronous period/offset */
 	NCR_700_writeb(NCR_700_get_SXFER(SCp->device),
@@ -1626,7 +1639,7 @@ NCR_700_intr(int irq, void *dev_id)
 					slot->SG[i].ins = bS_to_host(SCRIPT_NOP);
 					slot->SG[i].pAddr = 0;
 				}
-				dma_cache_sync(hostdata->dev, slot->SG, sizeof(slot->SG),
DMA_TO_DEVICE);
+				dma_sync_to_dev(hostdata, slot->SG, sizeof(slot->SG));
 				/* and pretend we disconnected after
 				 * the command phase */
 				resume_offset = hostdata->pScript + Ent_MsgInDuringData;
@@ -1878,7 +1891,7 @@ NCR_700_queuecommand_lck(struct scsi_cmnd *SCp, void
(*done)(struct scsi_cmnd *)
 		}
 		slot->SG[i].ins = bS_to_host(SCRIPT_RETURN);
 		slot->SG[i].pAddr = 0;
-		dma_cache_sync(hostdata->dev, slot->SG, sizeof(slot->SG),
DMA_TO_DEVICE);
+		dma_sync_to_dev(hostdata, slot->SG, sizeof(slot->SG));
 		DEBUG((" SETTING %p to %x\n",
 		       (&slot->pSG[i].ins),
 		       slot->SG[i].ins));
diff --git a/drivers/scsi/53c700.h b/drivers/scsi/53c700.h
index 05fe439b66afe5..0f545b05fe611d 100644
--- a/drivers/scsi/53c700.h
+++ b/drivers/scsi/53c700.h
@@ -209,6 +209,7 @@ struct NCR_700_Host_Parameters {
 #endif
 	__u32	chip710:1;	/* set if really a 710 not 700 */
 	__u32	burst_length:4;	/* set to 0 to disable 710 bursting */
+	__u32	noncoherent:1;	/* needs to use non-coherent DMA */
 
 	/* NOTHING BELOW HERE NEEDS ALTERING */
 	__u32	fast:1;		/* if we can alter the SCSI bus clock
@@ -429,7 +430,7 @@ struct NCR_700_Host_Parameters {
 	for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); i++) { \
 		__u32 val = bS_to_cpu((script)[A_##symbol##_used[i]]) + da; \
 		(script)[A_##symbol##_used[i]] = bS_to_host(val); \
-		dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE);
\
+		dma_sync_to_dev((dev), &(script)[A_##symbol##_used[i]], 4); \
 		DEBUG((" script, patching %s at %d to %pad\n", \
 		       #symbol, A_##symbol##_used[i], &da)); \
 	} \
@@ -441,7 +442,7 @@ struct NCR_700_Host_Parameters {
 	dma_addr_t da = value; \
 	for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); i++) { \
 		(script)[A_##symbol##_used[i]] = bS_to_host(da); \
-		dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE);
\
+		dma_sync_to_dev((dev), &(script)[A_##symbol##_used[i]], 4); \
 		DEBUG((" script, patching %s at %d to %pad\n", \
 		       #symbol, A_##symbol##_used[i], &da)); \
 	} \
@@ -456,7 +457,7 @@ struct NCR_700_Host_Parameters {
 		val &= 0xff00ffff; \
 		val |= ((value) & 0xff) << 16; \
 		(script)[A_##symbol##_used[i]] = bS_to_host(val); \
-		dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE);
\
+		dma_sync_to_dev((dev), &(script)[A_##symbol##_used[i]], 4); \
 		DEBUG((" script, patching ID field %s at %d to 0x%x\n", \
 		       #symbol, A_##symbol##_used[i], val)); \
 	} \
@@ -470,7 +471,7 @@ struct NCR_700_Host_Parameters {
 		val &= 0xffff0000; \
 		val |= ((value) & 0xffff); \
 		(script)[A_##symbol##_used[i]] = bS_to_host(val); \
-		dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE);
\
+		dma_sync_to_dev((dev), &(script)[A_##symbol##_used[i]], 4); \
 		DEBUG((" script, patching short field %s at %d to 0x%x\n", \
 		       #symbol, A_##symbol##_used[i], val)); \
 	} \
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 08/28] MIPS: make dma_sync_*_for_cpu a little less overzealous
When transferring DMA ownership back to the CPU there should never
be any writeback from the cache, as the buffer was owned by the
device until now.  Instead it should just be invalidated for the
mapping directions where the device could have written data.
Note that the changes rely on the fact that kmap_atomic is stubbed
out for the !HIGHMEM case to simplify the code a bit.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 arch/mips/mm/dma-noncoherent.c | 44 +++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 16 deletions(-)
diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c
index 563c2c0d0c8193..97a14adbafc99c 100644
--- a/arch/mips/mm/dma-noncoherent.c
+++ b/arch/mips/mm/dma-noncoherent.c
@@ -55,22 +55,34 @@ void *arch_dma_set_uncached(void *addr, size_t size)
 	return (void *)(__pa(addr) + UNCAC_BASE);
 }
 
-static inline void dma_sync_virt(void *addr, size_t size,
+static inline void dma_sync_virt_for_device(void *addr, size_t size,
 		enum dma_data_direction dir)
 {
 	switch (dir) {
 	case DMA_TO_DEVICE:
 		dma_cache_wback((unsigned long)addr, size);
 		break;
-
 	case DMA_FROM_DEVICE:
 		dma_cache_inv((unsigned long)addr, size);
 		break;
-
 	case DMA_BIDIRECTIONAL:
 		dma_cache_wback_inv((unsigned long)addr, size);
 		break;
+	default:
+		BUG();
+	}
+}
 
+static inline void dma_sync_virt_for_cpu(void *addr, size_t size,
+		enum dma_data_direction dir)
+{
+	switch (dir) {
+	case DMA_TO_DEVICE:
+		break;
+	case DMA_FROM_DEVICE:
+	case DMA_BIDIRECTIONAL:
+		dma_cache_inv((unsigned long)addr, size);
+		break;
 	default:
 		BUG();
 	}
@@ -82,7 +94,7 @@ static inline void dma_sync_virt(void *addr, size_t size,
  * configured then the bulk of this loop gets optimized out.
  */
 static inline void dma_sync_phys(phys_addr_t paddr, size_t size,
-		enum dma_data_direction dir)
+		enum dma_data_direction dir, bool for_device)
 {
 	struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
 	unsigned long offset = paddr & ~PAGE_MASK;
@@ -90,18 +102,20 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t
size,
 
 	do {
 		size_t len = left;
+		void *addr;
 
 		if (PageHighMem(page)) {
-			void *addr;
-
 			if (offset + len > PAGE_SIZE)
 				len = PAGE_SIZE - offset;
+		}
+
+		addr = kmap_atomic(page);
+		if (for_device)
+			dma_sync_virt_for_device(addr + offset, len, dir);
+		else
+			dma_sync_virt_for_cpu(addr + offset, len, dir);
+		kunmap_atomic(addr);
 
-			addr = kmap_atomic(page);
-			dma_sync_virt(addr + offset, len, dir);
-			kunmap_atomic(addr);
-		} else
-			dma_sync_virt(page_address(page) + offset, size, dir);
 		offset = 0;
 		page++;
 		left -= len;
@@ -111,7 +125,7 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t
size,
 void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
 		enum dma_data_direction dir)
 {
-	dma_sync_phys(paddr, size, dir);
+	dma_sync_phys(paddr, size, dir, true);
 }
 
 #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU
@@ -119,16 +133,14 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 		enum dma_data_direction dir)
 {
 	if (cpu_needs_post_dma_flush())
-		dma_sync_phys(paddr, size, dir);
+		dma_sync_phys(paddr, size, dir, false);
 }
 #endif
 
 void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 		enum dma_data_direction direction)
 {
-	BUG_ON(direction == DMA_NONE);
-
-	dma_sync_virt(vaddr, size, direction);
+	dma_sync_virt_for_device(vaddr, size, direction);
 }
 
 #ifdef CONFIG_DMA_PERDEV_COHERENT
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 09/28] MIPS/jazzdma: remove the unused vdma_remap function
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 arch/mips/include/asm/jazzdma.h |  2 -
 arch/mips/jazz/jazzdma.c        | 70 ---------------------------------
 2 files changed, 72 deletions(-)
diff --git a/arch/mips/include/asm/jazzdma.h b/arch/mips/include/asm/jazzdma.h
index d13f940022d5f9..c831da7fa89803 100644
--- a/arch/mips/include/asm/jazzdma.h
+++ b/arch/mips/include/asm/jazzdma.h
@@ -10,8 +10,6 @@
  */
 extern unsigned long vdma_alloc(unsigned long paddr, unsigned long size);
 extern int vdma_free(unsigned long laddr);
-extern int vdma_remap(unsigned long laddr, unsigned long paddr,
-		      unsigned long size);
 extern unsigned long vdma_phys2log(unsigned long paddr);
 extern unsigned long vdma_log2phys(unsigned long laddr);
 extern void vdma_stats(void);		/* for debugging only */
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index 014773f0bfcd74..fe40dbed04c1d6 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -209,76 +209,6 @@ int vdma_free(unsigned long laddr)
 
 EXPORT_SYMBOL(vdma_free);
 
-/*
- * Map certain page(s) to another physical address.
- * Caller must have allocated the page(s) before.
- */
-int vdma_remap(unsigned long laddr, unsigned long paddr, unsigned long size)
-{
-	int first, pages;
-
-	if (laddr > 0xffffff) {
-		if (vdma_debug)
-			printk
-			    ("vdma_map: Invalid logical address: %08lx\n",
-			     laddr);
-		return -EINVAL; /* invalid logical address */
-	}
-	if (paddr > 0x1fffffff) {
-		if (vdma_debug)
-			printk
-			    ("vdma_map: Invalid physical address: %08lx\n",
-			     paddr);
-		return -EINVAL; /* invalid physical address */
-	}
-
-	pages = (((paddr & (VDMA_PAGESIZE - 1)) + size) >> 12) + 1;
-	first = laddr >> 12;
-	if (vdma_debug)
-		printk("vdma_remap: first=%x, pages=%x\n", first, pages);
-	if (first + pages > VDMA_PGTBL_ENTRIES) {
-		if (vdma_debug)
-			printk("vdma_alloc: Invalid size: %08lx\n", size);
-		return -EINVAL;
-	}
-
-	paddr &= ~(VDMA_PAGESIZE - 1);
-	while (pages > 0 && first < VDMA_PGTBL_ENTRIES) {
-		if (pgtbl[first].owner != laddr) {
-			if (vdma_debug)
-				printk("Trying to remap other's pages.\n");
-			return -EPERM;	/* not owner */
-		}
-		pgtbl[first].frame = paddr;
-		paddr += VDMA_PAGESIZE;
-		first++;
-		pages--;
-	}
-
-	/*
-	 * Update translation table
-	 */
-	r4030_write_reg32(JAZZ_R4030_TRSTBL_INV, 0);
-
-	if (vdma_debug > 2) {
-		int i;
-		pages = (((paddr & (VDMA_PAGESIZE - 1)) + size) >> 12) + 1;
-		first = laddr >> 12;
-		printk("LADDR: ");
-		for (i = first; i < first + pages; i++)
-			printk("%08x ", i << 12);
-		printk("\nPADDR: ");
-		for (i = first; i < first + pages; i++)
-			printk("%08x ", pgtbl[i].frame);
-		printk("\nOWNER: ");
-		for (i = first; i < first + pages; i++)
-			printk("%08x ", pgtbl[i].owner);
-		printk("\n");
-	}
-
-	return 0;
-}
-
 /*
  * Translate a physical address to a logical address.
  * This will return the logical address of the first
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 10/28] MIPS/jazzdma: decouple from dma-direct
The jazzdma ops implement support for a very basic IOMMU.  Thus we really
should not use the dma-direct code that takes physical address limits
into account.  This survived through the great MIPS DMA ops cleanup mostly
because I was lazy, but now it is time to fully split the implementations.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 arch/mips/jazz/jazzdma.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index fe40dbed04c1d6..d0b5a2ba2b1a8a 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -16,7 +16,6 @@
 #include <linux/memblock.h>
 #include <linux/spinlock.h>
 #include <linux/gfp.h>
-#include <linux/dma-direct.h>
 #include <linux/dma-noncoherent.h>
 #include <asm/mipsregs.h>
 #include <asm/jazz.h>
@@ -492,26 +491,38 @@ int vdma_get_enable(int channel)
 static void *jazz_dma_alloc(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
+	struct page *page;
 	void *ret;
 
-	ret = dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
-	if (!ret)
-		return NULL;
+	if (attrs & DMA_ATTR_NO_WARN)
+		gfp |= __GFP_NOWARN;
 
-	*dma_handle = vdma_alloc(virt_to_phys(ret), size);
-	if (*dma_handle == DMA_MAPPING_ERROR) {
-		dma_direct_free_pages(dev, size, ret, *dma_handle, attrs);
+	size = PAGE_ALIGN(size);
+	page = alloc_pages(gfp, get_order(size));
+	if (!page)
 		return NULL;
-	}
+	ret = page_address(page);
+	*dma_handle = vdma_alloc(virt_to_phys(ret), size);
+	if (*dma_handle == DMA_MAPPING_ERROR)
+		goto out_free_pages;
+
+	if (attrs & DMA_ATTR_NON_CONSISTENT)
+		return ret;
+	arch_dma_prep_coherent(page, size);
+	return (void *)(UNCAC_BASE + __pa(ret));
 
-	return ret;
+out_free_pages:
+	__free_pages(page, get_order(size));
+	return NULL;
 }
 
 static void jazz_dma_free(struct device *dev, size_t size, void *vaddr,
 		dma_addr_t dma_handle, unsigned long attrs)
 {
 	vdma_free(dma_handle);
-	dma_direct_free_pages(dev, size, vaddr, dma_handle, attrs);
+	if (!(attrs & DMA_ATTR_NON_CONSISTENT))
+		vaddr = __va(vaddr - UNCAC_BASE);
+	__free_pages(virt_to_page(vaddr), get_order(size));
 }
 
 static dma_addr_t jazz_dma_map_page(struct device *dev, struct page *page,
@@ -608,7 +619,6 @@ const struct dma_map_ops jazz_dma_ops = {
 	.sync_single_for_device	= jazz_dma_sync_single_for_device,
 	.sync_sg_for_cpu	= jazz_dma_sync_sg_for_cpu,
 	.sync_sg_for_device	= jazz_dma_sync_sg_for_device,
-	.dma_supported		= dma_direct_supported,
 	.cache_sync		= arch_dma_cache_sync,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 11/28] dma-mapping: add (back) arch_dma_mark_clean for ia64
Add back a hook to optimize dcache flushing after reading executable
code using DMA.  This gets ia64 out of the business of pretending to
be dma incoherent just for this optimization.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 arch/ia64/Kconfig               |  3 +--
 arch/ia64/kernel/dma-mapping.c  | 14 +-------------
 arch/ia64/mm/init.c             |  3 +--
 include/linux/dma-direct.h      |  3 +++
 include/linux/dma-noncoherent.h |  8 ++++++++
 kernel/dma/Kconfig              |  6 ++++++
 kernel/dma/direct.c             |  3 +++
 7 files changed, 23 insertions(+), 17 deletions(-)
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 5b4ec80bf5863a..513ba0c5d33610 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -8,6 +8,7 @@ menu "Processor type and features"
 
 config IA64
 	bool
+	select ARCH_HAS_DMA_MARK_CLEAN
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ACPI
@@ -32,8 +33,6 @@ config IA64
 	select TTY
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_VIRT_CPU_ACCOUNTING
-	select DMA_NONCOHERENT_MMAP
-	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	select VIRT_TO_BUS
 	select GENERIC_IRQ_PROBE
 	select GENERIC_PENDING_IRQ if SMP
diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
index 09ef9ce9988d1f..f640ed6fe1d576 100644
--- a/arch/ia64/kernel/dma-mapping.c
+++ b/arch/ia64/kernel/dma-mapping.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
-#include <linux/dma-direct.h>
+#include <linux/dma-mapping.h>
 #include <linux/export.h>
 
 /* Set this to 1 if there is a HW IOMMU in the system */
@@ -7,15 +7,3 @@ int iommu_detected __read_mostly;
 
 const struct dma_map_ops *dma_ops;
 EXPORT_SYMBOL(dma_ops);
-
-void *arch_dma_alloc(struct device *dev, size_t size,
-		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
-{
-	return dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
-}
-
-void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
-		dma_addr_t dma_addr, unsigned long attrs)
-{
-	dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs);
-}
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 0b3fb4c7af2920..02e5aa08294ee0 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -73,8 +73,7 @@ __ia64_sync_icache_dcache (pte_t pte)
  * DMA can be marked as "clean" so that lazy_mmu_prot_update()
doesn't have to
  * flush them when they get mapped into an executable vm-area.
  */
-void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
-		enum dma_data_direction dir)
+void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
 {
 	unsigned long pfn = PHYS_PFN(paddr);
 
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 5a3ce2a2479437..738485b3578062 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -153,6 +153,9 @@ static inline void dma_direct_sync_single_for_cpu(struct
device *dev,
 
 	if (unlikely(is_swiotlb_buffer(paddr)))
 		swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
+
+	if (dir == DMA_FROM_DEVICE)
+		arch_dma_mark_clean(paddr, size);
 }
 
 static inline dma_addr_t dma_direct_map_page(struct device *dev,
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index ca09a4e07d2d3d..b9bc6c557ea46f 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -108,6 +108,14 @@ static inline void arch_dma_prep_coherent(struct page
*page, size_t size)
 }
 #endif /* CONFIG_ARCH_HAS_DMA_PREP_COHERENT */
 
+#ifdef CONFIG_ARCH_HAS_DMA_MARK_CLEAN
+void arch_dma_mark_clean(phys_addr_t paddr, size_t size);
+#else
+static inline void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
+{
+}
+#endif /* ARCH_HAS_DMA_MARK_CLEAN */
+
 void *arch_dma_set_uncached(void *addr, size_t size);
 void arch_dma_clear_uncached(void *addr, size_t size);
 
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 847a9d1fa6343d..6cf7f7947ae797 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -43,6 +43,12 @@ config ARCH_HAS_DMA_SET_MASK
 config ARCH_HAS_DMA_WRITE_COMBINE
 	bool
 
+#
+# Select if the architectures provides the arch_dma_mark_clean hook
+#
+config ARCH_HAS_DMA_MARK_CLEAN
+	bool
+
 config DMA_DECLARE_COHERENT
 	bool
 
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index bb0041e9965975..1123e767f4315f 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -340,6 +340,9 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
 		if (unlikely(is_swiotlb_buffer(paddr)))
 			swiotlb_tbl_sync_single(dev, paddr, sg->length, dir,
 					SYNC_FOR_CPU);
+
+		if (dir == DMA_FROM_DEVICE)
+			arch_dma_mark_clean(paddr, sg->length);
 	}
 
 	if (!dev_is_dma_coherent(dev))
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 12/28] dma-direct: remove dma_direct_{alloc, free}_pages
Just merge these helpers into the main dma_direct_{alloc,free} routines,
as the additional checks are always false for the two callers.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 arch/x86/kernel/amd_gart_64.c |  6 +++---
 include/linux/dma-direct.h    |  4 ----
 kernel/dma/direct.c           | 39 ++++++++++++++---------------------
 kernel/dma/pool.c             |  2 +-
 4 files changed, 19 insertions(+), 32 deletions(-)
diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index e89031e9c84761..adbf616d35d15d 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -468,7 +468,7 @@ gart_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_addr,
 {
 	void *vaddr;
 
-	vaddr = dma_direct_alloc_pages(dev, size, dma_addr, flag, attrs);
+	vaddr = dma_direct_alloc(dev, size, dma_addr, flag, attrs);
 	if (!vaddr ||
 	    !force_iommu || dev->coherent_dma_mask <= DMA_BIT_MASK(24))
 		return vaddr;
@@ -480,7 +480,7 @@ gart_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_addr,
 		goto out_free;
 	return vaddr;
 out_free:
-	dma_direct_free_pages(dev, size, vaddr, *dma_addr, attrs);
+	dma_direct_free(dev, size, vaddr, *dma_addr, attrs);
 	return NULL;
 }
 
@@ -490,7 +490,7 @@ gart_free_coherent(struct device *dev, size_t size, void
*vaddr,
 		   dma_addr_t dma_addr, unsigned long attrs)
 {
 	gart_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL, 0);
-	dma_direct_free_pages(dev, size, vaddr, dma_addr, attrs);
+	dma_direct_free(dev, size, vaddr, dma_addr, attrs);
 }
 
 static int no_agp;
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 738485b3578062..6a96a8ecac7cbc 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -80,10 +80,6 @@ void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
 		dma_addr_t dma_addr, unsigned long attrs);
-void *dma_direct_alloc_pages(struct device *dev, size_t size,
-		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs);
-void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
-		dma_addr_t dma_addr, unsigned long attrs);
 int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs);
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 1123e767f4315f..8da9a62dd9a72c 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -151,13 +151,18 @@ static struct page *__dma_direct_alloc_pages(struct device
*dev, size_t size,
 	return page;
 }
 
-void *dma_direct_alloc_pages(struct device *dev, size_t size,
+void *dma_direct_alloc(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
 	struct page *page;
 	void *ret;
 	int err;
 
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
+	    !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+	    dma_alloc_need_uncached(dev, attrs))
+		return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
+
 	size = PAGE_ALIGN(size);
 
 	if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
@@ -251,11 +256,18 @@ void *dma_direct_alloc_pages(struct device *dev, size_t
size,
 	return NULL;
 }
 
-void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
-		dma_addr_t dma_addr, unsigned long attrs)
+void dma_direct_free(struct device *dev, size_t size,
+		void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs)
 {
 	unsigned int page_order = get_order(size);
 
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
+	    !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+	    dma_alloc_need_uncached(dev, attrs)) {
+		arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
+		return;
+	}
+
 	/* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */
 	if (dma_should_free_from_pool(dev, attrs) &&
 	    dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
@@ -279,27 +291,6 @@ void dma_direct_free_pages(struct device *dev, size_t size,
void *cpu_addr,
 	dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
 }
 
-void *dma_direct_alloc(struct device *dev, size_t size,
-		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
-{
-	if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-	    !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-	    dma_alloc_need_uncached(dev, attrs))
-		return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
-	return dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
-}
-
-void dma_direct_free(struct device *dev, size_t size,
-		void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs)
-{
-	if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-	    !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-	    dma_alloc_need_uncached(dev, attrs))
-		arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
-	else
-		dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs);
-}
-
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
     defined(CONFIG_SWIOTLB)
 void dma_direct_sync_sg_for_device(struct device *dev,
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 6bc74a2d51273e..222cebf1f10548 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -86,7 +86,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t
pool_size,
 #endif
 	/*
 	 * Memory in the atomic DMA pools must be unencrypted, the pools do not
-	 * shrink so no re-encryption occurs in dma_direct_free_pages().
+	 * shrink so no re-encryption occurs in dma_direct_free().
 	 */
 	ret = set_memory_decrypted((unsigned long)page_to_virt(page),
 				   1 << order);
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 13/28] dma-direct: lift gfp_t manipulation out of__dma_direct_alloc_pages
Move the detailed gfp_t setup from __dma_direct_alloc_pages into the
caller to clean things up a little.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 kernel/dma/direct.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 8da9a62dd9a72c..01120510968fa1 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -108,7 +108,7 @@ static inline bool dma_should_free_from_pool(struct device
*dev,
 }
 
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
-		gfp_t gfp, unsigned long attrs)
+		gfp_t gfp)
 {
 	int node = dev_to_node(dev);
 	struct page *page = NULL;
@@ -116,11 +116,6 @@ static struct page *__dma_direct_alloc_pages(struct device
*dev, size_t size,
 
 	WARN_ON_ONCE(!PAGE_ALIGNED(size));
 
-	if (attrs & DMA_ATTR_NO_WARN)
-		gfp |= __GFP_NOWARN;
-
-	/* we always manually zero the memory once we are done: */
-	gfp &= ~__GFP_ZERO;
 	gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
 					   &phys_limit);
 	page = dma_alloc_contiguous(dev, size, gfp);
@@ -164,6 +159,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 		return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
 	size = PAGE_ALIGN(size);
+	if (attrs & DMA_ATTR_NO_WARN)
+		gfp |= __GFP_NOWARN;
 
 	if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
 		ret = dma_alloc_from_pool(dev, size, &page, gfp);
@@ -172,7 +169,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 		goto done;
 	}
 
-	page = __dma_direct_alloc_pages(dev, size, gfp, attrs);
+	/* we always manually zero the memory once we are done */
+	page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
 	if (!page)
 		return NULL;
 
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 14/28] dma-direct: use phys_to_dma_direct in dma_direct_alloc
Replace the currently open code copy. Signed-off-by: Christoph Hellwig <hch at lst.de> --- kernel/dma/direct.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 01120510968fa1..2e280b9c063449 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -235,10 +235,7 @@ void *dma_direct_alloc(struct device *dev, size_t size, goto out_encrypt_pages; } done: - if (force_dma_unencrypted(dev)) - *dma_handle = __phys_to_dma(dev, page_to_phys(page)); - else - *dma_handle = phys_to_dma(dev, page_to_phys(page)); + *dma_handle = phys_to_dma_direct(dev, page_to_phys(page)); return ret; out_encrypt_pages: -- 2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 15/28] dma-direct: remove __dma_to_phys
There is no harm in just always clearing the SME encryption bit, while
significantly simplifying the interface.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 arch/arm/include/asm/dma-direct.h      |  2 +-
 arch/mips/bmips/dma.c                  |  2 +-
 arch/mips/cavium-octeon/dma-octeon.c   |  2 +-
 arch/mips/include/asm/dma-direct.h     |  2 +-
 arch/mips/loongson2ef/fuloong-2e/dma.c |  2 +-
 arch/mips/loongson2ef/lemote-2f/dma.c  |  2 +-
 arch/mips/loongson64/dma.c             |  2 +-
 arch/mips/pci/pci-ar2315.c             |  2 +-
 arch/mips/pci/pci-xtalk-bridge.c       |  2 +-
 arch/mips/sgi-ip32/ip32-dma.c          |  2 +-
 arch/powerpc/include/asm/dma-direct.h  |  2 +-
 include/linux/dma-direct.h             | 21 +++++++++++----------
 kernel/dma/direct.c                    |  6 +-----
 13 files changed, 23 insertions(+), 26 deletions(-)
diff --git a/arch/arm/include/asm/dma-direct.h
b/arch/arm/include/asm/dma-direct.h
index 7c3001a6a775bf..a8cee87a93e8ab 100644
--- a/arch/arm/include/asm/dma-direct.h
+++ b/arch/arm/include/asm/dma-direct.h
@@ -8,7 +8,7 @@ static inline dma_addr_t __phys_to_dma(struct device *dev,
phys_addr_t paddr)
 	return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset;
 }
 
-static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t
dev_addr)
+static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
 {
 	unsigned int offset = dev_addr & ~PAGE_MASK;
 	return __pfn_to_phys(dma_to_pfn(dev, dev_addr)) + offset;
diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
index df56bf4179e347..ba2a5d33dfd3fa 100644
--- a/arch/mips/bmips/dma.c
+++ b/arch/mips/bmips/dma.c
@@ -52,7 +52,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t pa)
 	return pa;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
 	struct bmips_dma_range *r;
 
diff --git a/arch/mips/cavium-octeon/dma-octeon.c
b/arch/mips/cavium-octeon/dma-octeon.c
index 14ea680d180e07..388b13ba2558c2 100644
--- a/arch/mips/cavium-octeon/dma-octeon.c
+++ b/arch/mips/cavium-octeon/dma-octeon.c
@@ -177,7 +177,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t
paddr)
 	return paddr;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
 #ifdef CONFIG_PCI
 	if (dev && dev_is_pci(dev))
diff --git a/arch/mips/include/asm/dma-direct.h
b/arch/mips/include/asm/dma-direct.h
index 14e352651ce946..8e178651c638c2 100644
--- a/arch/mips/include/asm/dma-direct.h
+++ b/arch/mips/include/asm/dma-direct.h
@@ -3,6 +3,6 @@
 #define _MIPS_DMA_DIRECT_H 1
 
 dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr);
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr);
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr);
 
 #endif /* _MIPS_DMA_DIRECT_H */
diff --git a/arch/mips/loongson2ef/fuloong-2e/dma.c
b/arch/mips/loongson2ef/fuloong-2e/dma.c
index e122292bf6660a..83fadeb3fd7d56 100644
--- a/arch/mips/loongson2ef/fuloong-2e/dma.c
+++ b/arch/mips/loongson2ef/fuloong-2e/dma.c
@@ -6,7 +6,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return paddr | 0x80000000;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
 	return dma_addr & 0x7fffffff;
 }
diff --git a/arch/mips/loongson2ef/lemote-2f/dma.c
b/arch/mips/loongson2ef/lemote-2f/dma.c
index abf0e39d7e4696..302b43a14eee74 100644
--- a/arch/mips/loongson2ef/lemote-2f/dma.c
+++ b/arch/mips/loongson2ef/lemote-2f/dma.c
@@ -6,7 +6,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return paddr | 0x80000000;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
 	if (dma_addr > 0x8fffffff)
 		return dma_addr;
diff --git a/arch/mips/loongson64/dma.c b/arch/mips/loongson64/dma.c
index dbfe6e82fddd1c..b3dc5d0bd2b113 100644
--- a/arch/mips/loongson64/dma.c
+++ b/arch/mips/loongson64/dma.c
@@ -13,7 +13,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t
paddr)
 	return ((nid << 44) ^ paddr) | (nid << node_id_offset);
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
 	/* We extract 2bit node id (bit 44~47, only bit 44~45 used now) from
 	 * Loongson-3's 48bit address space and embed it into 40bit */
diff --git a/arch/mips/pci/pci-ar2315.c b/arch/mips/pci/pci-ar2315.c
index 490953f515282a..d88395684f487d 100644
--- a/arch/mips/pci/pci-ar2315.c
+++ b/arch/mips/pci/pci-ar2315.c
@@ -175,7 +175,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t
paddr)
 	return paddr + ar2315_dev_offset(dev);
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
 	return dma_addr - ar2315_dev_offset(dev);
 }
diff --git a/arch/mips/pci/pci-xtalk-bridge.c b/arch/mips/pci/pci-xtalk-bridge.c
index 9b3cc775c55e05..f1b37f32b55395 100644
--- a/arch/mips/pci/pci-xtalk-bridge.c
+++ b/arch/mips/pci/pci-xtalk-bridge.c
@@ -33,7 +33,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t
paddr)
 	return bc->baddr + paddr;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
 	return dma_addr & ~(0xffUL << 56);
 }
diff --git a/arch/mips/sgi-ip32/ip32-dma.c b/arch/mips/sgi-ip32/ip32-dma.c
index fa7b17cb53853e..160317294d97a9 100644
--- a/arch/mips/sgi-ip32/ip32-dma.c
+++ b/arch/mips/sgi-ip32/ip32-dma.c
@@ -27,7 +27,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t
paddr)
 	return dma_addr;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
 	phys_addr_t paddr = dma_addr & RAM_OFFSET_MASK;
 
diff --git a/arch/powerpc/include/asm/dma-direct.h
b/arch/powerpc/include/asm/dma-direct.h
index abc154d784b078..95b09313d2a4cf 100644
--- a/arch/powerpc/include/asm/dma-direct.h
+++ b/arch/powerpc/include/asm/dma-direct.h
@@ -7,7 +7,7 @@ static inline dma_addr_t __phys_to_dma(struct device *dev,
phys_addr_t paddr)
 	return paddr + dev->archdata.dma_offset;
 }
 
-static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr)
+static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
 	return daddr - dev->archdata.dma_offset;
 }
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 6a96a8ecac7cbc..811582a39e291f 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -24,11 +24,17 @@ static inline dma_addr_t __phys_to_dma(struct device *dev,
phys_addr_t paddr)
 	return dev_addr - ((dma_addr_t)dev->dma_pfn_offset << PAGE_SHIFT);
 }
 
-static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t
dev_addr)
+static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
 {
-	phys_addr_t paddr = (phys_addr_t)dev_addr;
-
-	return paddr + ((phys_addr_t)dev->dma_pfn_offset << PAGE_SHIFT);
+	phys_addr_t paddr = (phys_addr_t)dev_addr +
+		((phys_addr_t)dev->dma_pfn_offset << PAGE_SHIFT);
+
+	/*
+	 * Clear the Memory encryption mask if support by the architecture.  We
+	 * do this unconditionally so that we don't have to track if someone
+	 * fed us an encrypted or unencryped DMA address.
+	 */
+	return __sme_clr(paddr);
 }
 #endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */
 
@@ -44,7 +50,7 @@ static inline bool force_dma_unencrypted(struct device *dev)
 /*
  * If memory encryption is supported, phys_to_dma will set the memory
encryption
  * bit in the DMA address, and dma_to_phys will clear it.  The raw
__phys_to_dma
- * and __dma_to_phys versions should only be used on non-encrypted memory for
+ * version should only be used on non-encrypted memory for
  * special occasions like DMA coherent buffers.
  */
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
@@ -52,11 +58,6 @@ static inline dma_addr_t phys_to_dma(struct device *dev,
phys_addr_t paddr)
 	return __sme_set(__phys_to_dma(dev, paddr));
 }
 
-static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
-{
-	return __sme_clr(__dma_to_phys(dev, daddr));
-}
-
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t
size,
 		bool is_ram)
 {
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 2e280b9c063449..a97835983a34f7 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -48,11 +48,6 @@ gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64
dma_mask,
 {
 	u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit);
 
-	if (force_dma_unencrypted(dev))
-		*phys_limit = __dma_to_phys(dev, dma_limit);
-	else
-		*phys_limit = dma_to_phys(dev, dma_limit);
-
 	/*
 	 * Optimistically try the zone that the physical address mask falls
 	 * into first.  If that returns memory that isn't actually addressable
@@ -61,6 +56,7 @@ gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64
dma_mask,
 	 * Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding
 	 * zones.
 	 */
+	*phys_limit = dma_to_phys(dev, dma_limit);
 	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
 		return GFP_DMA;
 	if (*phys_limit <= DMA_BIT_MASK(32))
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 16/28] dma-direct: rename and cleanup __phys_to_dma
The __phys_to_dma vs phys_to_dma distinction isn't exactly obvious.  Try
to improve the situation by renaming __phys_to_dma to
phys_to_dma_unencryped, and not forcing architectures that want to
override phys_to_dma to actually provide __phys_to_dma.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 arch/arm/include/asm/dma-direct.h      |  2 +-
 arch/mips/bmips/dma.c                  |  2 +-
 arch/mips/cavium-octeon/dma-octeon.c   |  2 +-
 arch/mips/include/asm/dma-direct.h     |  2 +-
 arch/mips/loongson2ef/fuloong-2e/dma.c |  2 +-
 arch/mips/loongson2ef/lemote-2f/dma.c  |  2 +-
 arch/mips/loongson64/dma.c             |  2 +-
 arch/mips/pci/pci-ar2315.c             |  2 +-
 arch/mips/pci/pci-xtalk-bridge.c       |  2 +-
 arch/mips/sgi-ip32/ip32-dma.c          |  2 +-
 arch/powerpc/include/asm/dma-direct.h  |  2 +-
 drivers/iommu/intel/iommu.c            |  2 +-
 include/linux/dma-direct.h             | 28 +++++++++++++++-----------
 kernel/dma/direct.c                    |  8 ++++----
 kernel/dma/swiotlb.c                   |  4 ++--
 15 files changed, 34 insertions(+), 30 deletions(-)
diff --git a/arch/arm/include/asm/dma-direct.h
b/arch/arm/include/asm/dma-direct.h
index a8cee87a93e8ab..bca0de56753439 100644
--- a/arch/arm/include/asm/dma-direct.h
+++ b/arch/arm/include/asm/dma-direct.h
@@ -2,7 +2,7 @@
 #ifndef ASM_ARM_DMA_DIRECT_H
 #define ASM_ARM_DMA_DIRECT_H 1
 
-static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	unsigned int offset = paddr & ~PAGE_MASK;
 	return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset;
diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
index ba2a5d33dfd3fa..49061b870680b9 100644
--- a/arch/mips/bmips/dma.c
+++ b/arch/mips/bmips/dma.c
@@ -40,7 +40,7 @@ static struct bmips_dma_range *bmips_dma_ranges;
 
 #define FLUSH_RAC		0x100
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t pa)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t pa)
 {
 	struct bmips_dma_range *r;
 
diff --git a/arch/mips/cavium-octeon/dma-octeon.c
b/arch/mips/cavium-octeon/dma-octeon.c
index 388b13ba2558c2..232fa1017b1ec9 100644
--- a/arch/mips/cavium-octeon/dma-octeon.c
+++ b/arch/mips/cavium-octeon/dma-octeon.c
@@ -168,7 +168,7 @@ void __init octeon_pci_dma_init(void)
 }
 #endif /* CONFIG_PCI */
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 #ifdef CONFIG_PCI
 	if (dev && dev_is_pci(dev))
diff --git a/arch/mips/include/asm/dma-direct.h
b/arch/mips/include/asm/dma-direct.h
index 8e178651c638c2..9a640118316c9d 100644
--- a/arch/mips/include/asm/dma-direct.h
+++ b/arch/mips/include/asm/dma-direct.h
@@ -2,7 +2,7 @@
 #ifndef _MIPS_DMA_DIRECT_H
 #define _MIPS_DMA_DIRECT_H 1
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr);
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr);
 phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr);
 
 #endif /* _MIPS_DMA_DIRECT_H */
diff --git a/arch/mips/loongson2ef/fuloong-2e/dma.c
b/arch/mips/loongson2ef/fuloong-2e/dma.c
index 83fadeb3fd7d56..cea167d8aba8db 100644
--- a/arch/mips/loongson2ef/fuloong-2e/dma.c
+++ b/arch/mips/loongson2ef/fuloong-2e/dma.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/dma-direct.h>
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	return paddr | 0x80000000;
 }
diff --git a/arch/mips/loongson2ef/lemote-2f/dma.c
b/arch/mips/loongson2ef/lemote-2f/dma.c
index 302b43a14eee74..3c9e994563578c 100644
--- a/arch/mips/loongson2ef/lemote-2f/dma.c
+++ b/arch/mips/loongson2ef/lemote-2f/dma.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/dma-direct.h>
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	return paddr | 0x80000000;
 }
diff --git a/arch/mips/loongson64/dma.c b/arch/mips/loongson64/dma.c
index b3dc5d0bd2b113..364f2f27c8723f 100644
--- a/arch/mips/loongson64/dma.c
+++ b/arch/mips/loongson64/dma.c
@@ -4,7 +4,7 @@
 #include <linux/swiotlb.h>
 #include <boot_param.h>
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	/* We extract 2bit node id (bit 44~47, only bit 44~45 used now) from
 	 * Loongson-3's 48bit address space and embed it into 40bit */
diff --git a/arch/mips/pci/pci-ar2315.c b/arch/mips/pci/pci-ar2315.c
index d88395684f487d..cef4a47ab06311 100644
--- a/arch/mips/pci/pci-ar2315.c
+++ b/arch/mips/pci/pci-ar2315.c
@@ -170,7 +170,7 @@ static inline dma_addr_t ar2315_dev_offset(struct device
*dev)
 	return 0;
 }
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	return paddr + ar2315_dev_offset(dev);
 }
diff --git a/arch/mips/pci/pci-xtalk-bridge.c b/arch/mips/pci/pci-xtalk-bridge.c
index f1b37f32b55395..50f7d42cca5a78 100644
--- a/arch/mips/pci/pci-xtalk-bridge.c
+++ b/arch/mips/pci/pci-xtalk-bridge.c
@@ -25,7 +25,7 @@
 /*
  * Common phys<->dma mapping for platforms using pci xtalk bridge
  */
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct bridge_controller *bc = BRIDGE_CONTROLLER(pdev->bus);
diff --git a/arch/mips/sgi-ip32/ip32-dma.c b/arch/mips/sgi-ip32/ip32-dma.c
index 160317294d97a9..20c6da9d76bc5e 100644
--- a/arch/mips/sgi-ip32/ip32-dma.c
+++ b/arch/mips/sgi-ip32/ip32-dma.c
@@ -18,7 +18,7 @@
 
 #define RAM_OFFSET_MASK 0x3fffffffUL
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	dma_addr_t dma_addr = paddr & RAM_OFFSET_MASK;
 
diff --git a/arch/powerpc/include/asm/dma-direct.h
b/arch/powerpc/include/asm/dma-direct.h
index 95b09313d2a4cf..128304cbee1d87 100644
--- a/arch/powerpc/include/asm/dma-direct.h
+++ b/arch/powerpc/include/asm/dma-direct.h
@@ -2,7 +2,7 @@
 #ifndef ASM_POWERPC_DMA_DIRECT_H
 #define ASM_POWERPC_DMA_DIRECT_H 1
 
-static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	return paddr + dev->archdata.dma_offset;
 }
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index e9864e52b0e96a..99aa80456b7145 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3736,7 +3736,7 @@ bounce_map_single(struct device *dev, phys_addr_t paddr,
size_t size,
 	 */
 	if (!IS_ALIGNED(paddr | size, VTD_PAGE_SIZE)) {
 		tlb_addr = swiotlb_tbl_map_single(dev,
-				__phys_to_dma(dev, io_tlb_start),
+				phys_to_dma_unencrypted(dev, io_tlb_start),
 				paddr, size, aligned_size, dir, attrs);
 		if (tlb_addr == DMA_MAPPING_ERROR) {
 			goto swiotlb_error;
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 811582a39e291f..3797ecccc15466 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -16,14 +16,29 @@ extern unsigned int zone_dma_bits;
 
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include <asm/dma-direct.h>
+#ifndef phys_to_dma_unencrypted
+#define phys_to_dma_unencrypted		phys_to_dma
+#endif
 #else
-static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+static inline dma_addr_t phys_to_dma_unencrypted(struct device *dev,
+		phys_addr_t paddr)
 {
 	dma_addr_t dev_addr = (dma_addr_t)paddr;
 
 	return dev_addr - ((dma_addr_t)dev->dma_pfn_offset << PAGE_SHIFT);
 }
 
+/*
+ * If memory encryption is supported, phys_to_dma will set the memory
encryption
+ * bit in the DMA address, and dma_to_phys will clear it.
+ * phys_to_dma_unencrypted is for use on special unencrypted memory like
swiotlb
+ * buffers.
+ */
+static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
+{
+	return __sme_set(phys_to_dma_unencrypted(dev, paddr));
+}
+
 static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
 {
 	phys_addr_t paddr = (phys_addr_t)dev_addr +
@@ -47,17 +62,6 @@ static inline bool force_dma_unencrypted(struct device *dev)
 }
 #endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
 
-/*
- * If memory encryption is supported, phys_to_dma will set the memory
encryption
- * bit in the DMA address, and dma_to_phys will clear it.  The raw
__phys_to_dma
- * version should only be used on non-encrypted memory for
- * special occasions like DMA coherent buffers.
- */
-static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
-{
-	return __sme_set(__phys_to_dma(dev, paddr));
-}
-
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t
size,
 		bool is_ram)
 {
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index a97835983a34f7..e7963e51660792 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -25,7 +25,7 @@ static inline dma_addr_t phys_to_dma_direct(struct device
*dev,
 		phys_addr_t phys)
 {
 	if (force_dma_unencrypted(dev))
-		return __phys_to_dma(dev, phys);
+		return phys_to_dma_unencrypted(dev, phys);
 	return phys_to_dma(dev, phys);
 }
 
@@ -433,13 +433,13 @@ int dma_direct_supported(struct device *dev, u64 mask)
 		return 1;
 
 	/*
-	 * This check needs to be against the actual bit mask value, so
-	 * use __phys_to_dma() here so that the SME encryption mask isn't
+	 * This check needs to be against the actual bit mask value, so use
+	 * phys_to_dma_unencrypted() here so that the SME encryption mask isn't
 	 * part of the check.
 	 */
 	if (IS_ENABLED(CONFIG_ZONE_DMA))
 		min_mask = min_t(u64, min_mask, DMA_BIT_MASK(zone_dma_bits));
-	return mask >= __phys_to_dma(dev, min_mask);
+	return mask >= phys_to_dma_unencrypted(dev, min_mask);
 }
 
 size_t dma_direct_max_mapping_size(struct device *dev)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c19379fabd200e..4ea72d145cd27d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -670,13 +670,13 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t
paddr, size_t size,
 			      swiotlb_force);
 
 	swiotlb_addr = swiotlb_tbl_map_single(dev,
-			__phys_to_dma(dev, io_tlb_start),
+			phys_to_dma_unencrypted(dev, io_tlb_start),
 			paddr, size, size, dir, attrs);
 	if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
 		return DMA_MAPPING_ERROR;
 
 	/* Ensure that the address returned is DMA'ble */
-	dma_addr = __phys_to_dma(dev, swiotlb_addr);
+	dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
 	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
 		swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, size, dir,
 			attrs | DMA_ATTR_SKIP_CPU_SYNC);
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 17/28] dma-mapping: move dma_common_{mmap, get_sgtable} out of mapping.c
Add a new file that contains helpera for misc DMA ops, which is only
built when CONFIG_DMA_OPS is set.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 kernel/dma/Makefile      |  1 +
 kernel/dma/mapping.c     | 47 +-----------------------------------
 kernel/dma/ops_helpers.c | 51 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+), 46 deletions(-)
 create mode 100644 kernel/dma/ops_helpers.c
diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index 32c7c1942bbd6c..dc755ab68aabf9 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_HAS_DMA)			+= mapping.o direct.o
+obj-$(CONFIG_DMA_OPS)			+= ops_helpers.o
 obj-$(CONFIG_DMA_OPS)			+= dummy.o
 obj-$(CONFIG_DMA_CMA)			+= contiguous.o
 obj-$(CONFIG_DMA_DECLARE_COHERENT)	+= coherent.o
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 0d129421e75fc8..848c95c27d79ff 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -8,7 +8,7 @@
 #include <linux/memblock.h> /* for max_pfn */
 #include <linux/acpi.h>
 #include <linux/dma-direct.h>
-#include <linux/dma-noncoherent.h>
+#include <linux/dma-mapping.h>
 #include <linux/export.h>
 #include <linux/gfp.h>
 #include <linux/of_device.h>
@@ -295,22 +295,6 @@ void dma_sync_sg_for_device(struct device *dev, struct
scatterlist *sg,
 }
 EXPORT_SYMBOL(dma_sync_sg_for_device);
 
-/*
- * Create scatter-list for the already allocated DMA buffer.
- */
-int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt,
-		 void *cpu_addr, dma_addr_t dma_addr, size_t size,
-		 unsigned long attrs)
-{
-	struct page *page = virt_to_page(cpu_addr);
-	int ret;
-
-	ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-	if (!ret)
-		sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
-	return ret;
-}
-
 /*
  * The whole dma_get_sgtable() idea is fundamentally unsafe - it seems
  * that the intention is to allow exporting memory allocated via the
@@ -358,35 +342,6 @@ pgprot_t dma_pgprot(struct device *dev, pgprot_t prot,
unsigned long attrs)
 }
 #endif /* CONFIG_MMU */
 
-/*
- * Create userspace mapping for the DMA-coherent memory.
- */
-int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
-		void *cpu_addr, dma_addr_t dma_addr, size_t size,
-		unsigned long attrs)
-{
-#ifdef CONFIG_MMU
-	unsigned long user_count = vma_pages(vma);
-	unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	unsigned long off = vma->vm_pgoff;
-	int ret = -ENXIO;
-
-	vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs);
-
-	if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret))
-		return ret;
-
-	if (off >= count || user_count > count - off)
-		return -ENXIO;
-
-	return remap_pfn_range(vma, vma->vm_start,
-			page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff,
-			user_count << PAGE_SHIFT, vma->vm_page_prot);
-#else
-	return -ENXIO;
-#endif /* CONFIG_MMU */
-}
-
 /**
  * dma_can_mmap - check if a given device supports dma_mmap_*
  * @dev: device to check
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
new file mode 100644
index 00000000000000..e443c69be4299f
--- /dev/null
+++ b/kernel/dma/ops_helpers.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Helpers for DMA ops implementations.  These generally rely on the fact that
+ * the allocated memory contains normal pages in the direct kernel mapping.
+ */
+#include <linux/dma-noncoherent.h>
+
+/*
+ * Create scatter-list for the already allocated DMA buffer.
+ */
+int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt,
+		 void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		 unsigned long attrs)
+{
+	struct page *page = virt_to_page(cpu_addr);
+	int ret;
+
+	ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
+	if (!ret)
+		sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
+	return ret;
+}
+
+/*
+ * Create userspace mapping for the DMA-coherent memory.
+ */
+int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
+		void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		unsigned long attrs)
+{
+#ifdef CONFIG_MMU
+	unsigned long user_count = vma_pages(vma);
+	unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	unsigned long off = vma->vm_pgoff;
+	int ret = -ENXIO;
+
+	vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs);
+
+	if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret))
+		return ret;
+
+	if (off >= count || user_count > count - off)
+		return -ENXIO;
+
+	return remap_pfn_range(vma, vma->vm_start,
+			page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff,
+			user_count << PAGE_SHIFT, vma->vm_page_prot);
+#else
+	return -ENXIO;
+#endif /* CONFIG_MMU */
+}
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 18/28] dma-mapping: move the dma_declare_coherent_memory documentation
dma_declare_coherent_memory should not be in a DMA API guide aimed
at driver writers (that is consumers of the API).  Move it to a comment
near the function instead.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 Documentation/core-api/dma-api.rst | 24 ------------------------
 kernel/dma/coherent.c              | 17 +++++++++++++++++
 2 files changed, 17 insertions(+), 24 deletions(-)
diff --git a/Documentation/core-api/dma-api.rst
b/Documentation/core-api/dma-api.rst
index 3b3abbbb4b9a6f..90239348b30f6f 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -586,30 +586,6 @@ the DMA_ATTR_NON_CONSISTENT flag starting at virtual
address vaddr and
 continuing on for size.  Again, you *must* observe the cache line
 boundaries when doing this.
 
-::
-
-	int
-	dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr,
-				    dma_addr_t device_addr, size_t size);
-
-Declare region of memory to be handed out by dma_alloc_coherent() when
-it's asked for coherent memory for this device.
-
-phys_addr is the CPU physical address to which the memory is currently
-assigned (this will be ioremapped so the CPU can access the region).
-
-device_addr is the DMA address the device needs to be programmed
-with to actually address this memory (this will be handed out as the
-dma_addr_t in dma_alloc_coherent()).
-
-size is the size of the area (must be multiples of PAGE_SIZE).
-
-As a simplification for the platforms, only *one* such region of
-memory may be declared per device.
-
-For reasons of efficiency, most platforms choose to track the declared
-region only at the granularity of a page.  For smaller allocations,
-you should use the dma_pool() API.
 
 Part III - Debug drivers use of the DMA-API
 -------------------------------------------
diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c
index 2a0c4985f38e41..f85d14bbfcbe03 100644
--- a/kernel/dma/coherent.c
+++ b/kernel/dma/coherent.c
@@ -107,6 +107,23 @@ static int dma_assign_coherent_memory(struct device *dev,
 	return 0;
 }
 
+/*
+ * Declare a region of memory to be handed out by dma_alloc_coherent() when it
+ * is asked for coherent memory for this device.  This shall only be used
+ * from platform code, usually based on the device tree description.
+ * 
+ * phys_addr is the CPU physical address to which the memory is currently
+ * assigned (this will be ioremapped so the CPU can access the region).
+ *
+ * device_addr is the DMA address the device needs to be programmed with to
+ * actually address this memory (this will be handed out as the dma_addr_t in
+ * dma_alloc_coherent()).
+ *
+ * size is the size of the area (must be a multiple of PAGE_SIZE).
+ *
+ * As a simplification for the platforms, only *one* such region of memory may
+ * be declared per device.
+ */
 int dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr,
 				dma_addr_t device_addr, size_t size)
 {
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages
Add a new API to allocate and free pages that are guaranteed to be
addressable by a device, but otherwise behave like pages allocated by
alloc_pages.  The intended APIs to sync them for use with the device
and cpu are dma_sync_single_for_{device,cpu} that are also used for
streaming mappings.
Switch all drivers over to this new API, but keep the usage of the
crufty dma_cache_sync API for now, which will be cleaned up on a driver
by driver basis.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 Documentation/core-api/dma-api.rst        | 68 +++++++++++------------
 Documentation/core-api/dma-attributes.rst |  8 ---
 arch/alpha/kernel/pci_iommu.c             |  2 +
 arch/arm/mm/dma-mapping-nommu.c           |  2 +
 arch/arm/mm/dma-mapping.c                 |  4 ++
 arch/ia64/hp/common/sba_iommu.c           |  2 +
 arch/mips/jazz/jazzdma.c                  |  7 +--
 arch/powerpc/kernel/dma-iommu.c           |  2 +
 arch/powerpc/platforms/ps3/system-bus.c   |  4 ++
 arch/powerpc/platforms/pseries/vio.c      |  2 +
 arch/s390/pci/pci_dma.c                   |  2 +
 arch/x86/kernel/amd_gart_64.c             |  2 +
 drivers/iommu/dma-iommu.c                 |  2 +
 drivers/iommu/intel/iommu.c               |  4 ++
 drivers/net/ethernet/i825xx/lasi_82596.c  | 13 ++---
 drivers/net/ethernet/seeq/sgiseeq.c       | 12 ++--
 drivers/parisc/ccio-dma.c                 |  2 +
 drivers/parisc/sba_iommu.c                |  2 +
 drivers/scsi/53c700.c                     |  8 +--
 drivers/scsi/sgiwd93.c                    | 12 ++--
 drivers/xen/swiotlb-xen.c                 |  2 +
 include/linux/dma-direct.h                |  5 ++
 include/linux/dma-mapping.h               | 29 ++++++++--
 include/linux/dma-noncoherent.h           |  3 -
 kernel/dma/direct.c                       | 51 ++++++++++++++++-
 kernel/dma/mapping.c                      | 43 +++++++++++++-
 kernel/dma/ops_helpers.c                  | 35 ++++++++++++
 kernel/dma/virt.c                         |  2 +
 sound/mips/hal2.c                         | 20 +++----
 29 files changed, 254 insertions(+), 96 deletions(-)
diff --git a/Documentation/core-api/dma-api.rst
b/Documentation/core-api/dma-api.rst
index 90239348b30f6f..047fcfffa0e5cf 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -516,48 +516,53 @@ routines, e.g.:::
 	}
 
 
-Part II - Advanced dma usage
-----------------------------
+Part II - Non-coherent DMA allocations
+--------------------------------------
 
-Warning: These pieces of the DMA API should not be used in the
-majority of cases, since they cater for unlikely corner cases that
-don't belong in usual drivers.
+These APIs allow to allocate pages that can be used like normal pages
+in the kernel direct mapping, but are guaranteed to be DMA addressable.
 
 If you don't understand how cache line coherency works between a
 processor and an I/O device, you should not be using this part of the
-API at all.
+API.
 
 ::
 
 	void *
-	dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
-			gfp_t flag, unsigned long attrs)
+	dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
+			enum dma_data_direction dir, gfp_t gfp)
+
+This routine allocates a region of <size> bytes of consistent memory.  It
+returns a pointer to the allocated region (in the processor's virtual
address
+space) or NULL if the allocation failed. The returned memory is guanteed to
+behave like memory allocated using alloc_pages.
+
+It also returns a <dma_handle> which may be cast to an unsigned integer
the
+same width as the bus and given to the device as the DMA address base of
+the region.
 
-Identical to dma_alloc_coherent() except that when the
-DMA_ATTR_NON_CONSISTENT flags is passed in the attrs argument, the
-platform will choose to return either consistent or non-consistent memory
-as it sees fit.  By using this API, you are guaranteeing to the platform
-that you have all the correct and necessary sync points for this memory
-in the driver should it choose to return non-consistent memory.
+The dir parameter specified if data is read and/or written by the device,
+see dma_map_single() for details.
 
-Note: where the platform can return consistent memory, it will
-guarantee that the sync points become nops.
+The gfp parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation, but rejects flags used to specify a memory
+zone such as GFP_DMA or GFP_HIGHMEM.
 
-Warning:  Handling non-consistent memory is a real pain.  You should
-only use this API if you positively know your driver will be
-required to work on one of the rare (usually non-PCI) architectures
-that simply cannot make consistent memory.
+Before giving the memory to the device, dma_sync_single_for_device() needs
+to be called, and before reading memory written by the device,
+dma_sync_single_for_cpu(), just like for streaming DMA mappings that are
+reused.
 
 ::
 
 	void
-	dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
-		       dma_addr_t dma_handle, unsigned long attrs)
+	dma_free_pages(struct device *dev, size_t size, void *cpu_addr,
+			dma_addr_t dma_handle, enum dma_data_direction dir)
 
-Free memory allocated by the dma_alloc_attrs().  All common
-parameters must be identical to those otherwise passed to dma_free_coherent,
-and the attrs argument must be identical to the attrs passed to
-dma_alloc_attrs().
+Free a region of memory previously allocated using dma_alloc_pages().  dev,
+size and dma_handle and dir must all be the same as those passed into
+dma_alloc_pages().  cpu_addr must be the virtual address returned by
+the dma_alloc_pages().
 
 ::
 
@@ -575,17 +580,6 @@ memory or doing partial flushes.
 	into the width returned by this call.  It will also always be a power
 	of two for easy alignment.
 
-::
-
-	void
-	dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		       enum dma_data_direction direction)
-
-Do a partial sync of memory that was allocated by dma_alloc_attrs() with
-the DMA_ATTR_NON_CONSISTENT flag starting at virtual address vaddr and
-continuing on for size.  Again, you *must* observe the cache line
-boundaries when doing this.
-
 
 Part III - Debug drivers use of the DMA-API
 -------------------------------------------
diff --git a/Documentation/core-api/dma-attributes.rst
b/Documentation/core-api/dma-attributes.rst
index 29dcbe8826e85e..1887d92e8e9269 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -25,14 +25,6 @@ Since it is optional for platforms to implement
DMA_ATTR_WRITE_COMBINE,
 those that do not will simply ignore the attribute and exhibit default
 behavior.
 
-DMA_ATTR_NON_CONSISTENT
------------------------
-
-DMA_ATTR_NON_CONSISTENT lets the platform to choose to return either
-consistent or non-consistent memory as it sees fit.  By using this API,
-you are guaranteeing to the platform that you have all the correct and
-necessary sync points for this memory in the driver.
-
 DMA_ATTR_NO_KERNEL_MAPPING
 --------------------------
 
diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 81037907268d5c..291121e3b5a583 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -957,5 +957,7 @@ const struct dma_map_ops alpha_pci_ops = {
 	.dma_supported		= alpha_pci_supported,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 };
 EXPORT_SYMBOL(alpha_pci_ops);
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 287ef898a55e11..43c6d66b6e733a 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -176,6 +176,8 @@ static void arm_nommu_dma_sync_sg_for_cpu(struct device
*dev, struct scatterlist
 const struct dma_map_ops arm_nommu_dma_ops = {
 	.alloc			= arm_nommu_dma_alloc,
 	.free			= arm_nommu_dma_free,
+	.alloc_pages		= dma_direct_alloc_pages,
+	.free_pages		= dma_direct_free_pages,
 	.mmap			= arm_nommu_dma_mmap,
 	.map_page		= arm_nommu_dma_map_page,
 	.unmap_page		= arm_nommu_dma_unmap_page,
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8a8949174b1c06..7738b4d23f692f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -199,6 +199,8 @@ static int arm_dma_supported(struct device *dev, u64 mask)
 const struct dma_map_ops arm_dma_ops = {
 	.alloc			= arm_dma_alloc,
 	.free			= arm_dma_free,
+	.alloc_pages		= dma_direct_alloc_pages,
+	.free_pages		= dma_direct_free_pages,
 	.mmap			= arm_dma_mmap,
 	.get_sgtable		= arm_dma_get_sgtable,
 	.map_page		= arm_dma_map_page,
@@ -226,6 +228,8 @@ static int arm_coherent_dma_mmap(struct device *dev, struct
vm_area_struct *vma,
 const struct dma_map_ops arm_coherent_dma_ops = {
 	.alloc			= arm_coherent_dma_alloc,
 	.free			= arm_coherent_dma_free,
+	.alloc_pages		= dma_direct_alloc_pages,
+	.free_pages		= dma_direct_free_pages,
 	.mmap			= arm_coherent_dma_mmap,
 	.get_sgtable		= arm_dma_get_sgtable,
 	.map_page		= arm_coherent_dma_map_page,
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 656a4888c300b5..a51adc11a0b54b 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -2071,6 +2071,8 @@ static const struct dma_map_ops sba_dma_ops = {
 	.dma_supported		= sba_dma_supported,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 };
 
 static int __init
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index d0b5a2ba2b1a8a..0f9a9cb7fe7a95 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -505,9 +505,6 @@ static void *jazz_dma_alloc(struct device *dev, size_t size,
 	*dma_handle = vdma_alloc(virt_to_phys(ret), size);
 	if (*dma_handle == DMA_MAPPING_ERROR)
 		goto out_free_pages;
-
-	if (attrs & DMA_ATTR_NON_CONSISTENT)
-		return ret;
 	arch_dma_prep_coherent(page, size);
 	return (void *)(UNCAC_BASE + __pa(ret));
 
@@ -520,8 +517,6 @@ static void jazz_dma_free(struct device *dev, size_t size,
void *vaddr,
 		dma_addr_t dma_handle, unsigned long attrs)
 {
 	vdma_free(dma_handle);
-	if (!(attrs & DMA_ATTR_NON_CONSISTENT))
-		vaddr = __va(vaddr - UNCAC_BASE);
 	__free_pages(virt_to_page(vaddr), get_order(size));
 }
 
@@ -622,5 +617,7 @@ const struct dma_map_ops jazz_dma_ops = {
 	.cache_sync		= arch_dma_cache_sync,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 };
 EXPORT_SYMBOL(jazz_dma_ops);
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 569fecd7b5b234..d4e702d74b3393 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -137,4 +137,6 @@ const struct dma_map_ops dma_iommu_ops = {
 	.get_required_mask	= dma_iommu_get_required_mask,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 };
diff --git a/arch/powerpc/platforms/ps3/system-bus.c
b/arch/powerpc/platforms/ps3/system-bus.c
index 3542b7bd6a4689..7bc5f9be3e12d8 100644
--- a/arch/powerpc/platforms/ps3/system-bus.c
+++ b/arch/powerpc/platforms/ps3/system-bus.c
@@ -696,6 +696,8 @@ static const struct dma_map_ops ps3_sb_dma_ops = {
 	.unmap_page = ps3_unmap_page,
 	.mmap = dma_common_mmap,
 	.get_sgtable = dma_common_get_sgtable,
+	.alloc_pages = dma_common_alloc_pages,
+	.free_pages = dma_common_free_pages,
 };
 
 static const struct dma_map_ops ps3_ioc0_dma_ops = {
@@ -708,6 +710,8 @@ static const struct dma_map_ops ps3_ioc0_dma_ops = {
 	.unmap_page = ps3_unmap_page,
 	.mmap = dma_common_mmap,
 	.get_sgtable = dma_common_get_sgtable,
+	.alloc_pages = dma_common_alloc_pages,
+	.free_pages = dma_common_free_pages,
 };
 
 /**
diff --git a/arch/powerpc/platforms/pseries/vio.c
b/arch/powerpc/platforms/pseries/vio.c
index 0487b26f6f1af3..98ed7b09b3fe50 100644
--- a/arch/powerpc/platforms/pseries/vio.c
+++ b/arch/powerpc/platforms/pseries/vio.c
@@ -608,6 +608,8 @@ static const struct dma_map_ops vio_dma_mapping_ops = {
 	.get_required_mask = dma_iommu_get_required_mask,
 	.mmap		   = dma_common_mmap,
 	.get_sgtable	   = dma_common_get_sgtable,
+	.alloc_pages	   = dma_common_alloc_pages,
+	.free_pages	   = dma_common_free_pages,
 };
 
 /**
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 64b1399a73f04d..44004f790bdc44 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -670,6 +670,8 @@ const struct dma_map_ops s390_pci_dma_ops = {
 	.unmap_page	= s390_dma_unmap_pages,
 	.mmap		= dma_common_mmap,
 	.get_sgtable	= dma_common_get_sgtable,
+	.alloc_pages	= dma_common_alloc_pages,
+	.free_pages	= dma_common_free_pages,
 	/* dma_supported is unconditionally true without a callback */
 };
 EXPORT_SYMBOL_GPL(s390_pci_dma_ops);
diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index adbf616d35d15d..0310e6569350da 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -678,6 +678,8 @@ static const struct dma_map_ops gart_dma_ops = {
 	.get_sgtable			= dma_common_get_sgtable,
 	.dma_supported			= dma_direct_supported,
 	.get_required_mask		= dma_direct_get_required_mask,
+	.alloc_pages			= dma_direct_alloc_pages,
+	.free_pages			= dma_direct_free_pages,
 };
 
 static void gart_iommu_shutdown(void)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd07..3da06df0f327c2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1120,6 +1120,8 @@ static unsigned long iommu_dma_get_merge_boundary(struct
device *dev)
 static const struct dma_map_ops iommu_dma_ops = {
 	.alloc			= iommu_dma_alloc,
 	.free			= iommu_dma_free,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 	.mmap			= iommu_dma_mmap,
 	.get_sgtable		= iommu_dma_get_sgtable,
 	.map_page		= iommu_dma_map_page,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 99aa80456b7145..41fb349c1fd76b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3669,6 +3669,8 @@ static const struct dma_map_ops intel_dma_ops = {
 	.dma_supported = dma_direct_supported,
 	.mmap = dma_common_mmap,
 	.get_sgtable = dma_common_get_sgtable,
+	.alloc_pages = dma_common_alloc_pages,
+	.free_pages = dma_common_free_pages,
 	.get_required_mask = intel_get_required_mask,
 };
 
@@ -3922,6 +3924,8 @@ static const struct dma_map_ops bounce_dma_ops = {
 	.sync_sg_for_device	= bounce_sync_sg_for_device,
 	.map_resource		= bounce_map_resource,
 	.unmap_resource		= bounce_unmap_resource,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 	.dma_supported		= dma_direct_supported,
 };
 
diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c
b/drivers/net/ethernet/i825xx/lasi_82596.c
index 8c5ab9b7553e75..0c493b7237a910 100644
--- a/drivers/net/ethernet/i825xx/lasi_82596.c
+++ b/drivers/net/ethernet/i825xx/lasi_82596.c
@@ -184,9 +184,8 @@ lan_init_chip(struct parisc_device *dev)
 
 	lp = netdev_priv(netdevice);
 	lp->options = dev->id.sversion == 0x72 ? OPT_SWAP_PORT : 0;
-	lp->dma = dma_alloc_attrs(dev->dev.parent, sizeof(struct i596_dma),
-			      &lp->dma_addr, GFP_KERNEL,
-			      DMA_ATTR_NON_CONSISTENT);
+	lp->dma = dma_alloc_pages(dev->dev.parent, sizeof(struct i596_dma),
+				  &lp->dma_addr, DMA_BIDIRECTIONAL, GFP_KERNEL);
 	if (!lp->dma)
 		goto out_free_netdev;
 
@@ -196,8 +195,8 @@ lan_init_chip(struct parisc_device *dev)
 	return 0;
 
 out_free_dma:
-	dma_free_attrs(dev->dev.parent, sizeof(struct i596_dma),
-		       lp->dma, lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(dev->dev.parent, sizeof(struct i596_dma),
+		       lp->dma, lp->dma_addr, DMA_BIDIRECTIONAL);
 out_free_netdev:
 	free_netdev(netdevice);
 	return retval;
@@ -209,8 +208,8 @@ static int __exit lan_remove_chip(struct parisc_device
*pdev)
 	struct i596_private *lp = netdev_priv(dev);
 
 	unregister_netdev (dev);
-	dma_free_attrs(&pdev->dev, sizeof(struct i596_private), lp->dma,
-		       lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(&pdev->dev, sizeof(struct i596_private), lp->dma,
+		       lp->dma_addr, DMA_BIDIRECTIONAL);
 	free_netdev (dev);
 	return 0;
 }
diff --git a/drivers/net/ethernet/seeq/sgiseeq.c
b/drivers/net/ethernet/seeq/sgiseeq.c
index 8507ff2420143a..39599bbb5d45b6 100644
--- a/drivers/net/ethernet/seeq/sgiseeq.c
+++ b/drivers/net/ethernet/seeq/sgiseeq.c
@@ -740,8 +740,8 @@ static int sgiseeq_probe(struct platform_device *pdev)
 	sp = netdev_priv(dev);
 
 	/* Make private data page aligned */
-	sr = dma_alloc_attrs(&pdev->dev, sizeof(*sp->srings),
&sp->srings_dma,
-			     GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+	sr = dma_alloc_pages(&pdev->dev, sizeof(*sp->srings),
&sp->srings_dma,
+			     DMA_BIDIRECTIONAL, GFP_KERNEL);
 	if (!sr) {
 		printk(KERN_ERR "Sgiseeq: Page alloc failed, aborting.\n");
 		err = -ENOMEM;
@@ -802,8 +802,8 @@ static int sgiseeq_probe(struct platform_device *pdev)
 	return 0;
 
 err_out_free_attrs:
-	dma_free_attrs(&pdev->dev, sizeof(*sp->srings), sp->srings,
-		       sp->srings_dma, DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(&pdev->dev, sizeof(*sp->srings), sp->srings,
+		       sp->srings_dma, DMA_BIDIRECTIONAL);
 err_out_free_dev:
 	free_netdev(dev);
 
@@ -817,8 +817,8 @@ static int sgiseeq_remove(struct platform_device *pdev)
 	struct sgiseeq_private *sp = netdev_priv(dev);
 
 	unregister_netdev(dev);
-	dma_free_attrs(&pdev->dev, sizeof(*sp->srings), sp->srings,
-		       sp->srings_dma, DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(&pdev->dev, sizeof(*sp->srings), sp->srings,
+		       sp->srings_dma, DMA_BIDIRECTIONAL);
 	free_netdev(dev);
 
 	return 0;
diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c
index a5507f75b524c4..ae557e740781e3 100644
--- a/drivers/parisc/ccio-dma.c
+++ b/drivers/parisc/ccio-dma.c
@@ -1025,6 +1025,8 @@ static const struct dma_map_ops ccio_ops = {
 	.map_sg = 		ccio_map_sg,
 	.unmap_sg = 		ccio_unmap_sg,
 	.get_sgtable =		dma_common_get_sgtable,
+	.alloc_pages =		dma_common_alloc_pages,
+	.free_pages =		dma_common_free_pages,
 };
 
 #ifdef CONFIG_PROC_FS
diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c
index d4314fba026914..d3514c5761a944 100644
--- a/drivers/parisc/sba_iommu.c
+++ b/drivers/parisc/sba_iommu.c
@@ -1077,6 +1077,8 @@ static const struct dma_map_ops sba_ops = {
 	.map_sg =		sba_map_sg,
 	.unmap_sg =		sba_unmap_sg,
 	.get_sgtable =		dma_common_get_sgtable,
+	.alloc_pages =		dma_common_alloc_pages,
+	.free_pages =		dma_common_free_pages,
 };
 
 
diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index b197ed9399e2e0..521950d0731e4a 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -300,8 +300,8 @@ NCR_700_detect(struct scsi_host_template *tpnt,
 	memory = dma_alloc_coherent(dev, TOTAL_MEM_SIZE, &pScript, GFP_KERNEL);
 	if (!memory) {
 		hostdata->noncoherent = 1;
-		memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript,
-					 GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+		memory = dma_alloc_pages(dev, TOTAL_MEM_SIZE, &pScript,
+					 GFP_KERNEL, DMA_BIDIRECTIONAL);
 	}
 	if (!memory) {
 		printk(KERN_ERR "53c700: Failed to allocate memory for driver,
detaching\n");
@@ -414,8 +414,8 @@ NCR_700_release(struct Scsi_Host *host)
 		(struct NCR_700_Host_Parameters *)host->hostdata[0];
 
 	if (hostdata->noncoherent)
-		dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
-			       hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+		dma_free_pages(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
+			       hostdata->pScript, DMA_BIDIRECTIONAL);
 	else
 		dma_free_coherent(hostdata->dev, TOTAL_MEM_SIZE,
 				  hostdata->script, hostdata->pScript);
diff --git a/drivers/scsi/sgiwd93.c b/drivers/scsi/sgiwd93.c
index 3bdf0deb8f1529..133adcf99e9340 100644
--- a/drivers/scsi/sgiwd93.c
+++ b/drivers/scsi/sgiwd93.c
@@ -234,8 +234,8 @@ static int sgiwd93_probe(struct platform_device *pdev)
 
 	hdata = host_to_hostdata(host);
 	hdata->dev = &pdev->dev;
-	hdata->cpu = dma_alloc_attrs(&pdev->dev, HPC_DMA_SIZE,
&hdata->dma,
-				     GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+	hdata->cpu = dma_alloc_pages(&pdev->dev, HPC_DMA_SIZE,
&hdata->dma,
+				     DMA_BIDIRECTIONAL, GFP_KERNEL);
 	if (!hdata->cpu) {
 		printk(KERN_WARNING "sgiwd93: Could not allocate memory for "
 		       "host %d buffer.\n", unit);
@@ -274,8 +274,8 @@ static int sgiwd93_probe(struct platform_device *pdev)
 out_irq:
 	free_irq(irq, host);
 out_free:
-	dma_free_attrs(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
-		       DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
+			DMA_BIDIRECTIONAL);
 out_put:
 	scsi_host_put(host);
 out:
@@ -291,8 +291,8 @@ static int sgiwd93_remove(struct platform_device *pdev)
 
 	scsi_remove_host(host);
 	free_irq(pd->irq, host);
-	dma_free_attrs(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
-		       DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
+			DMA_BIDIRECTIONAL);
 	scsi_host_put(host);
 	return 0;
 }
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 39a0f2e0847c95..030a225624b060 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -578,4 +578,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
 	.dma_supported = xen_swiotlb_dma_supported,
 	.mmap = dma_common_mmap,
 	.get_sgtable = dma_common_get_sgtable,
+	.alloc_pages = dma_common_alloc_pages,
+	.free_pages = dma_common_free_pages,
 };
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 3797ecccc15466..7871ce3887cf85 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -85,6 +85,11 @@ void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
 		dma_addr_t dma_addr, unsigned long attrs);
+void *dma_direct_alloc_pages(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
+void dma_direct_free_pages(struct device *dev, size_t size,
+		void *cpu_addr, dma_addr_t dma_addr,
+		enum dma_data_direction dir);
 int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 016b96b384bdda..73fa6e10c5c8b5 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -27,11 +27,6 @@
  * buffered to improve performance.
  */
 #define DMA_ATTR_WRITE_COMBINE		(1UL << 2)
-/*
- * DMA_ATTR_NON_CONSISTENT: Lets the platform to choose to return either
- * consistent or non-consistent memory as it sees fit.
- */
-#define DMA_ATTR_NON_CONSISTENT		(1UL << 3)
 /*
  * DMA_ATTR_NO_KERNEL_MAPPING: Lets the platform to avoid creating a kernel
  * virtual mapping for the allocated buffer.
@@ -80,6 +75,11 @@ struct dma_map_ops {
 	void (*free)(struct device *dev, size_t size,
 			      void *vaddr, dma_addr_t dma_handle,
 			      unsigned long attrs);
+	void *(*alloc_pages)(struct device *dev, size_t size,
+			dma_addr_t *dma_handle, enum dma_data_direction dir,
+			gfp_t gfp);
+	void (*free_pages)(struct device *dev, size_t size, void *vaddr,
+			dma_addr_t dma_handle, enum dma_data_direction dir);
 	int (*mmap)(struct device *, struct vm_area_struct *,
 			  void *, dma_addr_t, size_t,
 			  unsigned long attrs);
@@ -254,6 +254,11 @@ void *dmam_alloc_attrs(struct device *dev, size_t size,
dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs);
 void dmam_free_coherent(struct device *dev, size_t size, void *vaddr,
 		dma_addr_t dma_handle);
+void *dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
+		enum dma_data_direction dir, gfp_t gfp);
+void dma_free_pages(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_handle, enum dma_data_direction dir);
+/* dma_cache_sync is deprecated: don't use in new code */
 void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 		enum dma_data_direction dir);
 int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
@@ -339,6 +344,15 @@ static inline void dmam_free_coherent(struct device *dev,
size_t size,
 		void *vaddr, dma_addr_t dma_handle)
 {
 }
+static inline void *dma_alloc_pages(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+	return NULL;
+}
+static inline void dma_free_pages(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+}
 static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 		enum dma_data_direction dir)
 {
@@ -513,7 +527,10 @@ static inline void dma_sync_sgtable_for_device(struct
device *dev,
 extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs);
-
+void *dma_common_alloc_pages(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
+void dma_common_free_pages(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_handle, enum dma_data_direction dir);
 struct page **dma_common_find_pages(void *cpu_addr);
 void *dma_common_contiguous_remap(struct page *page, size_t size,
 			pgprot_t prot, const void *caller);
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index b9bc6c557ea46f..1eecfd24d434f8 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -31,9 +31,6 @@ static __always_inline bool dma_alloc_need_uncached(struct
device *dev,
 		return false;
 	if (attrs & DMA_ATTR_NO_KERNEL_MAPPING)
 		return false;
-	if (IS_ENABLED(CONFIG_DMA_NONCOHERENT_CACHE_SYNC) &&
-	    (attrs & DMA_ATTR_NON_CONSISTENT))
-		return false;
 	return true;
 }
 
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index e7963e51660792..a0c4be0953b2b5 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Copyright (C) 2018 Christoph Hellwig.
+ * Copyright (C) 2018-2020 Christoph Hellwig.
  *
  * DMA operations that map physical memory directly without using an IOMMU.
  */
@@ -282,6 +282,55 @@ void dma_direct_free(struct device *dev, size_t size,
 	dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
 }
 
+void *dma_direct_alloc_pages(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+	struct page *page;
+	void *ret;
+
+	if (dma_should_alloc_from_pool(dev, gfp, 0)) {
+		page = dma_alloc_from_pool(dev, size, &ret, gfp,
+				dma_coherent_ok);
+		if (!page)
+			return NULL;
+		goto done;
+	}
+
+	page = __dma_direct_alloc_pages(dev, size, gfp);
+	if (!page)
+		return NULL;
+	ret = page_address(page);
+	if (force_dma_unencrypted(dev)) {
+		if (set_memory_decrypted((unsigned long)ret,
+				1 << get_order(size)))
+			goto out_free_pages;
+	}
+	memset(ret, 0, size);
+done:
+	*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
+	return ret;
+out_free_pages:
+	dma_free_contiguous(dev, page, size);
+	return NULL;
+}
+
+void dma_direct_free_pages(struct device *dev, size_t size,
+		void *cpu_addr, dma_addr_t dma_addr,
+		enum dma_data_direction dir)
+{
+	unsigned int page_order = get_order(size);
+
+	/* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */
+	if (dma_should_free_from_pool(dev, 0) &&
+	    dma_free_from_pool(dev, cpu_addr, size))
+		return;
+
+	if (force_dma_unencrypted(dev))
+		set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order);
+
+	dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
+}
+
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
     defined(CONFIG_SWIOTLB)
 void dma_direct_sync_sg_for_device(struct device *dev,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 848c95c27d79ff..dacdb7226caacd 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -330,9 +330,7 @@ pgprot_t dma_pgprot(struct device *dev, pgprot_t prot,
unsigned long attrs)
 {
 	if (force_dma_unencrypted(dev))
 		prot = pgprot_decrypted(prot);
-	if (dev_is_dma_coherent(dev) ||
-	    (IS_ENABLED(CONFIG_DMA_NONCOHERENT_CACHE_SYNC) &&
-             (attrs & DMA_ATTR_NON_CONSISTENT)))
+	if (dev_is_dma_coherent(dev))
 		return prot;
 #ifdef CONFIG_ARCH_HAS_DMA_WRITE_COMBINE
 	if (attrs & DMA_ATTR_WRITE_COMBINE)
@@ -461,6 +459,45 @@ void dma_free_attrs(struct device *dev, size_t size, void
*cpu_addr,
 }
 EXPORT_SYMBOL(dma_free_attrs);
 
+void *dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
+		enum dma_data_direction dir, gfp_t gfp)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+	void *vaddr;
+
+	if (WARN_ON_ONCE(!dev->coherent_dma_mask))
+		return NULL;
+	if (WARN_ON_ONCE(gfp & (__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM)))
+		return NULL;
+
+	size = PAGE_ALIGN(size);
+	if (dma_alloc_direct(dev, ops))
+		vaddr = dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
+	else if (ops->alloc_pages)
+		vaddr = ops->alloc_pages(dev, size, dma_handle, dir, gfp);
+	else
+		return NULL;
+
+	debug_dma_map_page(dev, virt_to_page(vaddr), 0, size, dir, *dma_handle);
+	return vaddr;
+}
+EXPORT_SYMBOL_GPL(dma_alloc_pages);
+
+void dma_free_pages(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	size = PAGE_ALIGN(size);
+	debug_dma_unmap_page(dev, dma_handle, size, dir);
+
+	if (dma_alloc_direct(dev, ops))
+		dma_direct_free_pages(dev, size, vaddr, dma_handle, dir);
+	else if (ops->free_pages)
+		ops->free_pages(dev, size, vaddr, dma_handle, dir);
+}
+EXPORT_SYMBOL_GPL(dma_free_pages);
+
 int dma_supported(struct device *dev, u64 mask)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index e443c69be4299f..17e66a41dc9c78 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -3,6 +3,7 @@
  * Helpers for DMA ops implementations.  These generally rely on the fact that
  * the allocated memory contains normal pages in the direct kernel mapping.
  */
+#include <linux/dma-contiguous.h>
 #include <linux/dma-noncoherent.h>
 
 /*
@@ -49,3 +50,37 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct
*vma,
 	return -ENXIO;
 #endif /* CONFIG_MMU */
 }
+
+void *dma_common_alloc_pages(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+	struct page *page;
+
+	page = dma_alloc_contiguous(dev, size, gfp);
+	if (!page)
+		page = alloc_pages_node(dev_to_node(dev), gfp, get_order(size));
+	if (!page)
+		return NULL;
+
+	*dma_handle = ops->map_page(dev, page, 0, size, dir,
+				    DMA_ATTR_SKIP_CPU_SYNC);
+	if (*dma_handle == DMA_MAPPING_ERROR) {
+		dma_free_contiguous(dev, page, size);
+		return NULL;
+	}
+
+	memset(page_address(page), 0, size);
+	return page_address(page);
+}
+
+void dma_common_free_pages(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	if (ops->unmap_page)
+		ops->unmap_page(dev, dma_handle, size, dir,
+				DMA_ATTR_SKIP_CPU_SYNC);
+	dma_free_contiguous(dev, virt_to_page(vaddr), size);
+}
diff --git a/kernel/dma/virt.c b/kernel/dma/virt.c
index ebe128833af7b5..6986bf1fd6689c 100644
--- a/kernel/dma/virt.c
+++ b/kernel/dma/virt.c
@@ -55,5 +55,7 @@ const struct dma_map_ops dma_virt_ops = {
 	.free			= dma_virt_free,
 	.map_page		= dma_virt_map_page,
 	.map_sg			= dma_virt_map_sg,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 };
 EXPORT_SYMBOL(dma_virt_ops);
diff --git a/sound/mips/hal2.c b/sound/mips/hal2.c
index ec84bc4c3a6e77..746c410bd9bf11 100644
--- a/sound/mips/hal2.c
+++ b/sound/mips/hal2.c
@@ -449,15 +449,15 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct
hal2_codec *codec)
 	int count = H2_BUF_SIZE / H2_BLOCK_SIZE;
 	int i;
 
-	codec->buffer = dma_alloc_attrs(dev, H2_BUF_SIZE, &buffer_dma,
-					GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+	codec->buffer = dma_alloc_pages(dev, H2_BUF_SIZE, &buffer_dma,
+					DMA_BIDIRECTIONAL, GFP_KERNEL);
 	if (!codec->buffer)
 		return -ENOMEM;
-	desc = dma_alloc_attrs(dev, count * sizeof(struct hal2_desc),
-			       &desc_dma, GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+	desc = dma_alloc_pages(dev, count * sizeof(struct hal2_desc), &desc_dma,
+			       DMA_BIDIRECTIONAL, GFP_KERNEL);
 	if (!desc) {
-		dma_free_attrs(dev, H2_BUF_SIZE, codec->buffer, buffer_dma,
-			       DMA_ATTR_NON_CONSISTENT);
+		dma_free_pages(dev, H2_BUF_SIZE, codec->buffer, buffer_dma,
+				DMA_BIDIRECTIONAL);
 		return -ENOMEM;
 	}
 	codec->buffer_dma = buffer_dma;
@@ -480,10 +480,10 @@ static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct
hal2_codec *codec)
 {
 	struct device *dev = hal2->card->dev;
 
-	dma_free_attrs(dev, codec->desc_count * sizeof(struct hal2_desc),
-		       codec->desc, codec->desc_dma, DMA_ATTR_NON_CONSISTENT);
-	dma_free_attrs(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma,
-		       DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(dev, codec->desc_count * sizeof(struct hal2_desc),
+		       codec->desc, codec->desc_dma, DMA_BIDIRECTIONAL);
+	dma_free_pages(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma,
+			DMA_BIDIRECTIONAL);
 }
 
 static const struct snd_pcm_hardware hal2_pcm_hw = {
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 20/28] sgiwd93: convert from dma_cache_sync to dma_sync_single_for_device
Use the proper modern API to transfer cache ownership for incoherent DMA.
This also means we can allocate the memory as DMA_TO_DEVICE instead
of bidirectional.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/scsi/sgiwd93.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/sgiwd93.c b/drivers/scsi/sgiwd93.c
index 133adcf99e9340..1538f65307f22f 100644
--- a/drivers/scsi/sgiwd93.c
+++ b/drivers/scsi/sgiwd93.c
@@ -95,7 +95,7 @@ void fill_hpc_entries(struct ip22_hostdata *hd, struct
scsi_cmnd *cmd, int din)
 	 */
 	hcp->desc.pbuf = 0;
 	hcp->desc.cntinfo = HPCDMA_EOX;
-	dma_cache_sync(hd->dev, hd->cpu,
+	dma_sync_single_for_device(hd->dev, hd->dma,
 		       (unsigned long)(hcp + 1) - (unsigned long)hd->cpu,
 		       DMA_TO_DEVICE);
 }
@@ -235,7 +235,7 @@ static int sgiwd93_probe(struct platform_device *pdev)
 	hdata = host_to_hostdata(host);
 	hdata->dev = &pdev->dev;
 	hdata->cpu = dma_alloc_pages(&pdev->dev, HPC_DMA_SIZE,
&hdata->dma,
-				     DMA_BIDIRECTIONAL, GFP_KERNEL);
+				     DMA_TO_DEVICE, GFP_KERNEL);
 	if (!hdata->cpu) {
 		printk(KERN_WARNING "sgiwd93: Could not allocate memory for "
 		       "host %d buffer.\n", unit);
@@ -275,7 +275,7 @@ static int sgiwd93_probe(struct platform_device *pdev)
 	free_irq(irq, host);
 out_free:
 	dma_free_pages(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
-			DMA_BIDIRECTIONAL);
+			DMA_TO_DEVICE);
 out_put:
 	scsi_host_put(host);
 out:
@@ -292,7 +292,7 @@ static int sgiwd93_remove(struct platform_device *pdev)
 	scsi_remove_host(host);
 	free_irq(pd->irq, host);
 	dma_free_pages(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
-			DMA_BIDIRECTIONAL);
+			DMA_TO_DEVICE);
 	scsi_host_put(host);
 	return 0;
 }
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 21/28] hal2: convert from dma_cache_sync to dma_sync_single_for_device
Use the proper modern API to transfer cache ownership for incoherent DMA.
This also means we can allocate the buffer memory with the proper
direction instead of bidirectional.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 sound/mips/hal2.c | 44 ++++++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 24 deletions(-)
diff --git a/sound/mips/hal2.c b/sound/mips/hal2.c
index 746c410bd9bf11..c8e429a5f48f85 100644
--- a/sound/mips/hal2.c
+++ b/sound/mips/hal2.c
@@ -441,7 +441,8 @@ static inline void hal2_stop_adc(struct snd_hal2 *hal2)
 	hal2->adc.pbus.pbus->pbdma_ctrl = HPC3_PDMACTRL_LD;
 }
 
-static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
+static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec,
+		enum dma_data_direction buffer_dir)
 {
 	struct device *dev = hal2->card->dev;
 	struct hal2_desc *desc;
@@ -450,14 +451,14 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct
hal2_codec *codec)
 	int i;
 
 	codec->buffer = dma_alloc_pages(dev, H2_BUF_SIZE, &buffer_dma,
-					DMA_BIDIRECTIONAL, GFP_KERNEL);
+					buffer_dir, GFP_KERNEL);
 	if (!codec->buffer)
 		return -ENOMEM;
 	desc = dma_alloc_pages(dev, count * sizeof(struct hal2_desc), &desc_dma,
 			       DMA_BIDIRECTIONAL, GFP_KERNEL);
 	if (!desc) {
 		dma_free_pages(dev, H2_BUF_SIZE, codec->buffer, buffer_dma,
-				DMA_BIDIRECTIONAL);
+				buffer_dir);
 		return -ENOMEM;
 	}
 	codec->buffer_dma = buffer_dma;
@@ -470,20 +471,22 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct
hal2_codec *codec)
 		      desc_dma : desc_dma + (i + 1) * sizeof(struct hal2_desc);
 		desc++;
 	}
-	dma_cache_sync(dev, codec->desc, count * sizeof(struct hal2_desc),
-		       DMA_TO_DEVICE);
+	dma_sync_single_for_device(dev, codec->desc_dma,
+				   count * sizeof(struct hal2_desc),
+				   DMA_BIDIRECTIONAL);
 	codec->desc_count = count;
 	return 0;
 }
 
-static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
+static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec,
+		enum dma_data_direction buffer_dir)
 {
 	struct device *dev = hal2->card->dev;
 
 	dma_free_pages(dev, codec->desc_count * sizeof(struct hal2_desc),
 		       codec->desc, codec->desc_dma, DMA_BIDIRECTIONAL);
 	dma_free_pages(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma,
-			DMA_BIDIRECTIONAL);
+			buffer_dir);
 }
 
 static const struct snd_pcm_hardware hal2_pcm_hw = {
@@ -509,21 +512,16 @@ static int hal2_playback_open(struct snd_pcm_substream
*substream)
 {
 	struct snd_pcm_runtime *runtime = substream->runtime;
 	struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
-	int err;
 
 	runtime->hw = hal2_pcm_hw;
-
-	err = hal2_alloc_dmabuf(hal2, &hal2->dac);
-	if (err)
-		return err;
-	return 0;
+	return hal2_alloc_dmabuf(hal2, &hal2->dac, DMA_TO_DEVICE);
 }
 
 static int hal2_playback_close(struct snd_pcm_substream *substream)
 {
 	struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
 
-	hal2_free_dmabuf(hal2, &hal2->dac);
+	hal2_free_dmabuf(hal2, &hal2->dac, DMA_TO_DEVICE);
 	return 0;
 }
 
@@ -579,7 +577,9 @@ static void hal2_playback_transfer(struct snd_pcm_substream
*substream,
 	unsigned char *buf = hal2->dac.buffer + rec->hw_data;
 
 	memcpy(buf, substream->runtime->dma_area + rec->sw_data, bytes);
-	dma_cache_sync(hal2->card->dev, buf, bytes, DMA_TO_DEVICE);
+	dma_sync_single_for_device(hal2->card->dev,
+			hal2->dac.buffer_dma + rec->hw_data, bytes,
+			DMA_TO_DEVICE);
 
 }
 
@@ -597,22 +597,16 @@ static int hal2_capture_open(struct snd_pcm_substream
*substream)
 {
 	struct snd_pcm_runtime *runtime = substream->runtime;
 	struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
-	struct hal2_codec *adc = &hal2->adc;
-	int err;
 
 	runtime->hw = hal2_pcm_hw;
-
-	err = hal2_alloc_dmabuf(hal2, adc);
-	if (err)
-		return err;
-	return 0;
+	return hal2_alloc_dmabuf(hal2, &hal2->adc, DMA_FROM_DEVICE);
 }
 
 static int hal2_capture_close(struct snd_pcm_substream *substream)
 {
 	struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
 
-	hal2_free_dmabuf(hal2, &hal2->adc);
+	hal2_free_dmabuf(hal2, &hal2->adc, DMA_FROM_DEVICE);
 	return 0;
 }
 
@@ -667,7 +661,9 @@ static void hal2_capture_transfer(struct snd_pcm_substream
*substream,
 	struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
 	unsigned char *buf = hal2->adc.buffer + rec->hw_data;
 
-	dma_cache_sync(hal2->card->dev, buf, bytes, DMA_FROM_DEVICE);
+	dma_sync_single_for_cpu(hal2->card->dev,
+			hal2->adc.buffer_dma + rec->hw_data, bytes,
+			DMA_FROM_DEVICE);
 	memcpy(substream->runtime->dma_area + rec->sw_data, buf, bytes);
 }
 
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 22/28] sgiseeq: convert from dma_cache_sync to dma_sync_single_for_device
Use the proper modern API to transfer cache ownership for incoherent DMA.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/net/ethernet/seeq/sgiseeq.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/seeq/sgiseeq.c
b/drivers/net/ethernet/seeq/sgiseeq.c
index 39599bbb5d45b6..f91dae16d69a19 100644
--- a/drivers/net/ethernet/seeq/sgiseeq.c
+++ b/drivers/net/ethernet/seeq/sgiseeq.c
@@ -112,14 +112,18 @@ struct sgiseeq_private {
 
 static inline void dma_sync_desc_cpu(struct net_device *dev, void *addr)
 {
-	dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
-		       DMA_FROM_DEVICE);
+	struct sgiseeq_private *sp = netdev_priv(dev);
+
+	dma_sync_single_for_cpu(dev->dev.parent, VIRT_TO_DMA(sp, addr),
+			sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
 }
 
 static inline void dma_sync_desc_dev(struct net_device *dev, void *addr)
 {
-	dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
-		       DMA_TO_DEVICE);
+	struct sgiseeq_private *sp = netdev_priv(dev);
+
+	dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr),
+			sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
 }
 
 static inline void hpc3_eth_reset(struct hpc3_ethregs *hregs)
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 23/28] lib82596: convert from dma_cache_sync to dma_sync_single_for_device
Use the proper modern API to transfer cache ownership for incoherent DMA.
Note that this moves the DMA helpers to the main lib82596.c file, so
that they can use virt_to_dma.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/net/ethernet/i825xx/lasi_82596.c |  11 +--
 drivers/net/ethernet/i825xx/lib82596.c   | 114 ++++++++++++++---------
 drivers/net/ethernet/i825xx/sni_82596.c  |   4 -
 3 files changed, 73 insertions(+), 56 deletions(-)
diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c
b/drivers/net/ethernet/i825xx/lasi_82596.c
index 0c493b7237a910..d13b610935bcf3 100644
--- a/drivers/net/ethernet/i825xx/lasi_82596.c
+++ b/drivers/net/ethernet/i825xx/lasi_82596.c
@@ -96,21 +96,14 @@
 
 #define OPT_SWAP_PORT	0x0001	/* Need to wordswp on the MPU port */
 
-#define DMA_WBACK(ndev, addr, len) \
-	do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_TO_DEVICE);
} while (0)
-
-#define DMA_INV(ndev, addr, len) \
-	do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len,
DMA_FROM_DEVICE); } while (0)
-
-#define DMA_WBACK_INV(ndev, addr, len) \
-	do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len,
DMA_BIDIRECTIONAL); } while (0)
-
 #define SYSBUS      0x0000006c
 
 /* big endian CPU, 82596 "big" endian mode */
 #define SWAP32(x)   (((u32)(x)<<16) | ((((u32)(x)))>>16))
 #define SWAP16(x)   (x)
 
+#define NONCOHERENT_DMA 1
+
 #include "lib82596.c"
 
 MODULE_AUTHOR("Richard Hirst");
diff --git a/drivers/net/ethernet/i825xx/lib82596.c
b/drivers/net/ethernet/i825xx/lib82596.c
index b4e4b3eb5758b5..ca2fb303fcc6f6 100644
--- a/drivers/net/ethernet/i825xx/lib82596.c
+++ b/drivers/net/ethernet/i825xx/lib82596.c
@@ -365,13 +365,44 @@ static int max_cmd_backlog = TX_RING_SIZE-1;
 static void i596_poll_controller(struct net_device *dev);
 #endif
 
+static inline dma_addr_t virt_to_dma(struct i596_private *lp, volatile void *v)
+{
+	return lp->dma_addr + ((unsigned long)v - (unsigned long)lp->dma);
+}
+
+#ifdef NONCOHERENT_DMA
+static inline void dma_sync_dev(struct net_device *ndev, volatile void *addr,
+		size_t len)
+{
+	dma_sync_single_for_device(ndev->dev.parent,
+			virt_to_dma(netdev_priv(ndev), addr), len,
+			DMA_BIDIRECTIONAL);
+}
+
+static inline void dma_sync_cpu(struct net_device *ndev, volatile void *addr,
+		size_t len)
+{
+	dma_sync_single_for_cpu(ndev->dev.parent,
+			virt_to_dma(netdev_priv(ndev), addr), len,
+			DMA_BIDIRECTIONAL);
+}
+#else
+static inline void dma_sync_dev(struct net_device *ndev, volatile void *addr,
+		size_t len)
+{
+}
+static inline void dma_sync_cpu(struct net_device *ndev, volatile void *addr,
+		size_t len)
+{
+}
+#endif /* NONCOHERENT_DMA */
 
 static inline int wait_istat(struct net_device *dev, struct i596_dma *dma, int
delcnt, char *str)
 {
-	DMA_INV(dev, &(dma->iscp), sizeof(struct i596_iscp));
+	dma_sync_cpu(dev, &(dma->iscp), sizeof(struct i596_iscp));
 	while (--delcnt && dma->iscp.stat) {
 		udelay(10);
-		DMA_INV(dev, &(dma->iscp), sizeof(struct i596_iscp));
+		dma_sync_cpu(dev, &(dma->iscp), sizeof(struct i596_iscp));
 	}
 	if (!delcnt) {
 		printk(KERN_ERR "%s: %s, iscp.stat %04x, didn't clear\n",
@@ -384,10 +415,10 @@ static inline int wait_istat(struct net_device *dev,
struct i596_dma *dma, int d
 
 static inline int wait_cmd(struct net_device *dev, struct i596_dma *dma, int
delcnt, char *str)
 {
-	DMA_INV(dev, &(dma->scb), sizeof(struct i596_scb));
+	dma_sync_cpu(dev, &(dma->scb), sizeof(struct i596_scb));
 	while (--delcnt && dma->scb.command) {
 		udelay(10);
-		DMA_INV(dev, &(dma->scb), sizeof(struct i596_scb));
+		dma_sync_cpu(dev, &(dma->scb), sizeof(struct i596_scb));
 	}
 	if (!delcnt) {
 		printk(KERN_ERR "%s: %s, status %4.4x, cmd %4.4x.\n",
@@ -451,12 +482,9 @@ static void i596_display_data(struct net_device *dev)
 		       SWAP32(rbd->b_data), SWAP16(rbd->size));
 		rbd = rbd->v_next;
 	} while (rbd != lp->rbd_head);
-	DMA_INV(dev, dma, sizeof(struct i596_dma));
+	dma_sync_cpu(dev, dma, sizeof(struct i596_dma));
 }
 
-
-#define virt_to_dma(lp, v) ((lp)->dma_addr + (dma_addr_t)((unsigned
long)(v)-(unsigned long)((lp)->dma)))
-
 static inline int init_rx_bufs(struct net_device *dev)
 {
 	struct i596_private *lp = netdev_priv(dev);
@@ -508,7 +536,7 @@ static inline int init_rx_bufs(struct net_device *dev)
 	rfd->b_next = SWAP32(virt_to_dma(lp, dma->rfds));
 	rfd->cmd = SWAP16(CMD_EOL|CMD_FLEX);
 
-	DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma));
+	dma_sync_dev(dev, dma, sizeof(struct i596_dma));
 	return 0;
 }
 
@@ -547,7 +575,7 @@ static void rebuild_rx_bufs(struct net_device *dev)
 	lp->rbd_head = dma->rbds;
 	dma->rfds[0].rbd = SWAP32(virt_to_dma(lp, dma->rbds));
 
-	DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma));
+	dma_sync_dev(dev, dma, sizeof(struct i596_dma));
 }
 
 
@@ -575,9 +603,9 @@ static int init_i596_mem(struct net_device *dev)
 
 	DEB(DEB_INIT, printk(KERN_DEBUG "%s: starting i82596.\n",
dev->name));
 
-	DMA_WBACK(dev, &(dma->scp), sizeof(struct i596_scp));
-	DMA_WBACK(dev, &(dma->iscp), sizeof(struct i596_iscp));
-	DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb));
+	dma_sync_dev(dev, &(dma->scp), sizeof(struct i596_scp));
+	dma_sync_dev(dev, &(dma->iscp), sizeof(struct i596_iscp));
+	dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));
 
 	mpu_port(dev, PORT_ALTSCP, virt_to_dma(lp, &dma->scp));
 	ca(dev);
@@ -596,24 +624,24 @@ static int init_i596_mem(struct net_device *dev)
 	rebuild_rx_bufs(dev);
 
 	dma->scb.command = 0;
-	DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb));
+	dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));
 
 	DEB(DEB_INIT, printk(KERN_DEBUG
 			     "%s: queuing CmdConfigure\n", dev->name));
 	memcpy(dma->cf_cmd.i596_config, init_setup, 14);
 	dma->cf_cmd.cmd.command = SWAP16(CmdConfigure);
-	DMA_WBACK(dev, &(dma->cf_cmd), sizeof(struct cf_cmd));
+	dma_sync_dev(dev, &(dma->cf_cmd), sizeof(struct cf_cmd));
 	i596_add_cmd(dev, &dma->cf_cmd.cmd);
 
 	DEB(DEB_INIT, printk(KERN_DEBUG "%s: queuing CmdSASetup\n",
dev->name));
 	memcpy(dma->sa_cmd.eth_addr, dev->dev_addr, ETH_ALEN);
 	dma->sa_cmd.cmd.command = SWAP16(CmdSASetup);
-	DMA_WBACK(dev, &(dma->sa_cmd), sizeof(struct sa_cmd));
+	dma_sync_dev(dev, &(dma->sa_cmd), sizeof(struct sa_cmd));
 	i596_add_cmd(dev, &dma->sa_cmd.cmd);
 
 	DEB(DEB_INIT, printk(KERN_DEBUG "%s: queuing CmdTDR\n",
dev->name));
 	dma->tdr_cmd.cmd.command = SWAP16(CmdTDR);
-	DMA_WBACK(dev, &(dma->tdr_cmd), sizeof(struct tdr_cmd));
+	dma_sync_dev(dev, &(dma->tdr_cmd), sizeof(struct tdr_cmd));
 	i596_add_cmd(dev, &dma->tdr_cmd.cmd);
 
 	spin_lock_irqsave (&lp->lock, flags);
@@ -625,7 +653,7 @@ static int init_i596_mem(struct net_device *dev)
 	DEB(DEB_INIT, printk(KERN_DEBUG "%s: Issuing RX_START\n",
dev->name));
 	dma->scb.command = SWAP16(RX_START);
 	dma->scb.rfd = SWAP32(virt_to_dma(lp, dma->rfds));
-	DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb));
+	dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));
 
 	ca(dev);
 
@@ -659,13 +687,13 @@ static inline int i596_rx(struct net_device *dev)
 
 	rfd = lp->rfd_head;		/* Ref next frame to check */
 
-	DMA_INV(dev, rfd, sizeof(struct i596_rfd));
+	dma_sync_cpu(dev, rfd, sizeof(struct i596_rfd));
 	while (rfd->stat & SWAP16(STAT_C)) {	/* Loop while complete frames */
 		if (rfd->rbd == I596_NULL)
 			rbd = NULL;
 		else if (rfd->rbd == lp->rbd_head->b_addr) {
 			rbd = lp->rbd_head;
-			DMA_INV(dev, rbd, sizeof(struct i596_rbd));
+			dma_sync_cpu(dev, rbd, sizeof(struct i596_rbd));
 		} else {
 			printk(KERN_ERR "%s: rbd chain broken!\n", dev->name);
 			/* XXX Now what? */
@@ -713,7 +741,7 @@ static inline int i596_rx(struct net_device *dev)
 							  DMA_FROM_DEVICE);
 				rbd->v_data = newskb->data;
 				rbd->b_data = SWAP32(dma_addr);
-				DMA_WBACK_INV(dev, rbd, sizeof(struct i596_rbd));
+				dma_sync_dev(dev, rbd, sizeof(struct i596_rbd));
 			} else {
 				skb = netdev_alloc_skb_ip_align(dev, pkt_len);
 			}
@@ -765,7 +793,7 @@ static inline int i596_rx(struct net_device *dev)
 		if (rbd != NULL && (rbd->count & SWAP16(0x4000))) {
 			rbd->count = 0;
 			lp->rbd_head = rbd->v_next;
-			DMA_WBACK_INV(dev, rbd, sizeof(struct i596_rbd));
+			dma_sync_dev(dev, rbd, sizeof(struct i596_rbd));
 		}
 
 		/* Tidy the frame descriptor, marking it as end of list */
@@ -779,14 +807,14 @@ static inline int i596_rx(struct net_device *dev)
 
 		lp->dma->scb.rfd = rfd->b_next;
 		lp->rfd_head = rfd->v_next;
-		DMA_WBACK_INV(dev, rfd, sizeof(struct i596_rfd));
+		dma_sync_dev(dev, rfd, sizeof(struct i596_rfd));
 
 		/* Remove end-of-list from old end descriptor */
 
 		rfd->v_prev->cmd = SWAP16(CMD_FLEX);
-		DMA_WBACK_INV(dev, rfd->v_prev, sizeof(struct i596_rfd));
+		dma_sync_dev(dev, rfd->v_prev, sizeof(struct i596_rfd));
 		rfd = lp->rfd_head;
-		DMA_INV(dev, rfd, sizeof(struct i596_rfd));
+		dma_sync_cpu(dev, rfd, sizeof(struct i596_rfd));
 	}
 
 	DEB(DEB_RXFRAME, printk(KERN_DEBUG "frames %d\n", frames));
@@ -827,12 +855,12 @@ static inline void i596_cleanup_cmd(struct net_device
*dev, struct i596_private
 			ptr->v_next = NULL;
 			ptr->b_next = I596_NULL;
 		}
-		DMA_WBACK_INV(dev, ptr, sizeof(struct i596_cmd));
+		dma_sync_dev(dev, ptr, sizeof(struct i596_cmd));
 	}
 
 	wait_cmd(dev, lp->dma, 100, "i596_cleanup_cmd timed out");
 	lp->dma->scb.cmd = I596_NULL;
-	DMA_WBACK(dev, &(lp->dma->scb), sizeof(struct i596_scb));
+	dma_sync_dev(dev, &(lp->dma->scb), sizeof(struct i596_scb));
 }
 
 
@@ -850,7 +878,7 @@ static inline void i596_reset(struct net_device *dev, struct
i596_private *lp)
 
 	/* FIXME: this command might cause an lpmc */
 	lp->dma->scb.command = SWAP16(CUC_ABORT | RX_ABORT);
-	DMA_WBACK(dev, &(lp->dma->scb), sizeof(struct i596_scb));
+	dma_sync_dev(dev, &(lp->dma->scb), sizeof(struct i596_scb));
 	ca(dev);
 
 	/* wait for shutdown */
@@ -878,20 +906,20 @@ static void i596_add_cmd(struct net_device *dev, struct
i596_cmd *cmd)
 	cmd->command |= SWAP16(CMD_EOL | CMD_INTR);
 	cmd->v_next = NULL;
 	cmd->b_next = I596_NULL;
-	DMA_WBACK(dev, cmd, sizeof(struct i596_cmd));
+	dma_sync_dev(dev, cmd, sizeof(struct i596_cmd));
 
 	spin_lock_irqsave (&lp->lock, flags);
 
 	if (lp->cmd_head != NULL) {
 		lp->cmd_tail->v_next = cmd;
 		lp->cmd_tail->b_next = SWAP32(virt_to_dma(lp, &cmd->status));
-		DMA_WBACK(dev, lp->cmd_tail, sizeof(struct i596_cmd));
+		dma_sync_dev(dev, lp->cmd_tail, sizeof(struct i596_cmd));
 	} else {
 		lp->cmd_head = cmd;
 		wait_cmd(dev, dma, 100, "i596_add_cmd timed out");
 		dma->scb.cmd = SWAP32(virt_to_dma(lp, &cmd->status));
 		dma->scb.command = SWAP16(CUC_START);
-		DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb));
+		dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));
 		ca(dev);
 	}
 	lp->cmd_tail = cmd;
@@ -956,7 +984,7 @@ static void i596_tx_timeout (struct net_device *dev,
unsigned int txqueue)
 		/* Issue a channel attention signal */
 		DEB(DEB_ERRORS, printk(KERN_DEBUG "Kicking board.\n"));
 		lp->dma->scb.command = SWAP16(CUC_START | RX_START);
-		DMA_WBACK_INV(dev, &(lp->dma->scb), sizeof(struct i596_scb));
+		dma_sync_dev(dev, &(lp->dma->scb), sizeof(struct i596_scb));
 		ca (dev);
 		lp->last_restart = dev->stats.tx_packets;
 	}
@@ -1014,8 +1042,8 @@ static netdev_tx_t i596_start_xmit(struct sk_buff *skb,
struct net_device *dev)
 		tbd->data = SWAP32(tx_cmd->dma_addr);
 
 		DEB(DEB_TXADDR, print_eth(skb->data, "tx-queued"));
-		DMA_WBACK_INV(dev, tx_cmd, sizeof(struct tx_cmd));
-		DMA_WBACK_INV(dev, tbd, sizeof(struct i596_tbd));
+		dma_sync_dev(dev, tx_cmd, sizeof(struct tx_cmd));
+		dma_sync_dev(dev, tbd, sizeof(struct i596_tbd));
 		i596_add_cmd(dev, &tx_cmd->cmd);
 
 		dev->stats.tx_packets++;
@@ -1071,7 +1099,7 @@ static int i82596_probe(struct net_device *dev)
 	lp->dma->scb.rfd = I596_NULL;
 	spin_lock_init(&lp->lock);
 
-	DMA_WBACK_INV(dev, lp->dma, sizeof(struct i596_dma));
+	dma_sync_dev(dev, lp->dma, sizeof(struct i596_dma));
 
 	ret = register_netdev(dev);
 	if (ret)
@@ -1141,7 +1169,7 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)
 				   dev->name, status & 0x0700));
 
 		while (lp->cmd_head != NULL) {
-			DMA_INV(dev, lp->cmd_head, sizeof(struct i596_cmd));
+			dma_sync_cpu(dev, lp->cmd_head, sizeof(struct i596_cmd));
 			if (!(lp->cmd_head->status & SWAP16(STAT_C)))
 				break;
 
@@ -1223,7 +1251,7 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)
 			}
 			ptr->v_next = NULL;
 			ptr->b_next = I596_NULL;
-			DMA_WBACK(dev, ptr, sizeof(struct i596_cmd));
+			dma_sync_dev(dev, ptr, sizeof(struct i596_cmd));
 			lp->last_cmd = jiffies;
 		}
 
@@ -1237,13 +1265,13 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)
 
 			ptr->command &= SWAP16(0x1fff);
 			ptr = ptr->v_next;
-			DMA_WBACK_INV(dev, prev, sizeof(struct i596_cmd));
+			dma_sync_dev(dev, prev, sizeof(struct i596_cmd));
 		}
 
 		if (lp->cmd_head != NULL)
 			ack_cmd |= CUC_START;
 		dma->scb.cmd = SWAP32(virt_to_dma(lp, &lp->cmd_head->status));
-		DMA_WBACK_INV(dev, &dma->scb, sizeof(struct i596_scb));
+		dma_sync_dev(dev, &dma->scb, sizeof(struct i596_scb));
 	}
 	if ((status & 0x1000) || (status & 0x4000)) {
 		if ((status & 0x4000))
@@ -1268,7 +1296,7 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)
 	}
 	wait_cmd(dev, dma, 100, "i596 interrupt, timeout");
 	dma->scb.command = SWAP16(ack_cmd);
-	DMA_WBACK(dev, &dma->scb, sizeof(struct i596_scb));
+	dma_sync_dev(dev, &dma->scb, sizeof(struct i596_scb));
 
 	/* DANGER: I suspect that some kind of interrupt
 	 acknowledgement aside from acking the 82596 might be needed
@@ -1299,7 +1327,7 @@ static int i596_close(struct net_device *dev)
 
 	wait_cmd(dev, lp->dma, 100, "close1 timed out");
 	lp->dma->scb.command = SWAP16(CUC_ABORT | RX_ABORT);
-	DMA_WBACK(dev, &lp->dma->scb, sizeof(struct i596_scb));
+	dma_sync_dev(dev, &lp->dma->scb, sizeof(struct i596_scb));
 
 	ca(dev);
 
@@ -1358,7 +1386,7 @@ static void set_multicast_list(struct net_device *dev)
 			       dev->name);
 		else {
 			dma->cf_cmd.cmd.command = SWAP16(CmdConfigure);
-			DMA_WBACK_INV(dev, &dma->cf_cmd, sizeof(struct cf_cmd));
+			dma_sync_dev(dev, &dma->cf_cmd, sizeof(struct cf_cmd));
 			i596_add_cmd(dev, &dma->cf_cmd.cmd);
 		}
 	}
@@ -1390,7 +1418,7 @@ static void set_multicast_list(struct net_device *dev)
 					   dev->name, cp));
 			cp += ETH_ALEN;
 		}
-		DMA_WBACK_INV(dev, &dma->mc_cmd, sizeof(struct mc_cmd));
+		dma_sync_dev(dev, &dma->mc_cmd, sizeof(struct mc_cmd));
 		i596_add_cmd(dev, &cmd->cmd);
 	}
 }
diff --git a/drivers/net/ethernet/i825xx/sni_82596.c
b/drivers/net/ethernet/i825xx/sni_82596.c
index e80e790ffbd4d4..507d60cd6f9b33 100644
--- a/drivers/net/ethernet/i825xx/sni_82596.c
+++ b/drivers/net/ethernet/i825xx/sni_82596.c
@@ -24,10 +24,6 @@
 
 static const char sni_82596_string[] = "snirm_82596";
 
-#define DMA_WBACK(priv, addr, len)     do { } while (0)
-#define DMA_INV(priv, addr, len)       do { } while (0)
-#define DMA_WBACK_INV(priv, addr, len) do { } while (0)
-
 #define SYSBUS      0x00004400
 
 /* big endian CPU, 82596 little endian */
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 24/28] 53c700: convert from dma_cache_sync to dma_sync_single_for_device
Use the proper modern API to transfer cache ownership for incoherent DMA.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/scsi/53c700.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index 521950d0731e4a..57a08c42d00325 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -269,18 +269,25 @@ NCR_700_get_SXFER(struct scsi_device *SDp)
 					      spi_period(SDp->sdev_target));
 }
 
+static inline dma_addr_t virt_to_dma(struct NCR_700_Host_Parameters *h, void
*p)
+{
+	return h->pScript + ((uintptr_t)p - (uintptr_t)h->script);
+}
+
 static inline void dma_sync_to_dev(struct NCR_700_Host_Parameters *h,
 		void *addr, size_t size)
 {
 	if (h->noncoherent)
-		dma_cache_sync(h->dev, addr, size, DMA_TO_DEVICE);
+		dma_sync_single_for_device(h->dev, virt_to_dma(h, addr),
+					   size, DMA_BIDIRECTIONAL);
 }
 
 static inline void dma_sync_from_dev(struct NCR_700_Host_Parameters *h,
 		void *addr, size_t size)
 {
 	if (h->noncoherent)
-		dma_cache_sync(h->dev, addr, size, DMA_FROM_DEVICE);
+		dma_sync_single_for_device(h->dev, virt_to_dma(h, addr), size,
+					   DMA_BIDIRECTIONAL);
 }
 
 struct Scsi_Host *
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 25/28] dma-mapping: remove dma_cache_sync
All users are gone now, remove the API.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 arch/mips/Kconfig               |  1 -
 arch/mips/jazz/jazzdma.c        |  1 -
 arch/mips/mm/dma-noncoherent.c  |  6 ------
 arch/parisc/Kconfig             |  1 -
 arch/parisc/kernel/pci-dma.c    |  6 ------
 include/linux/dma-mapping.h     |  9 ---------
 include/linux/dma-noncoherent.h | 10 ----------
 kernel/dma/Kconfig              |  3 ---
 kernel/dma/mapping.c            | 14 --------------
 9 files changed, 51 deletions(-)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index c95fa3a2484cf0..1be91c5d666e61 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1134,7 +1134,6 @@ config DMA_NONCOHERENT
 	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
 	select ARCH_HAS_DMA_SET_UNCACHED
 	select DMA_NONCOHERENT_MMAP
-	select DMA_NONCOHERENT_CACHE_SYNC
 	select NEED_DMA_MAP_STATE
 
 config SYS_HAS_EARLY_PRINTK
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index 0f9a9cb7fe7a95..e2efe43f5f9cc3 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -614,7 +614,6 @@ const struct dma_map_ops jazz_dma_ops = {
 	.sync_single_for_device	= jazz_dma_sync_single_for_device,
 	.sync_sg_for_cpu	= jazz_dma_sync_sg_for_cpu,
 	.sync_sg_for_device	= jazz_dma_sync_sg_for_device,
-	.cache_sync		= arch_dma_cache_sync,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
 	.alloc_pages		= dma_common_alloc_pages,
diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c
index 97a14adbafc99c..f34ad1f09799f1 100644
--- a/arch/mips/mm/dma-noncoherent.c
+++ b/arch/mips/mm/dma-noncoherent.c
@@ -137,12 +137,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 }
 #endif
 
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		enum dma_data_direction direction)
-{
-	dma_sync_virt_for_device(vaddr, size, direction);
-}
-
 #ifdef CONFIG_DMA_PERDEV_COHERENT
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 		const struct iommu_ops *iommu, bool coherent)
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 3b0f53dd70bc9b..ed15da1da174e0 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -195,7 +195,6 @@ config PA11
 	depends on PA7000 || PA7100LC || PA7200 || PA7300LC
 	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
-	select DMA_NONCOHERENT_CACHE_SYNC
 
 config PREFETCH
 	def_bool y
diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
index 38c68e131bbe2a..ce38c0b9158125 100644
--- a/arch/parisc/kernel/pci-dma.c
+++ b/arch/parisc/kernel/pci-dma.c
@@ -454,9 +454,3 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 {
 	flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size);
 }
-
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-	       enum dma_data_direction direction)
-{
-	flush_kernel_dcache_range((unsigned long)vaddr, size);
-}
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 73fa6e10c5c8b5..7321df0b9ffc83 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -123,8 +123,6 @@ struct dma_map_ops {
 	void (*sync_sg_for_device)(struct device *dev,
 				   struct scatterlist *sg, int nents,
 				   enum dma_data_direction dir);
-	void (*cache_sync)(struct device *dev, void *vaddr, size_t size,
-			enum dma_data_direction direction);
 	int (*dma_supported)(struct device *dev, u64 mask);
 	u64 (*get_required_mask)(struct device *dev);
 	size_t (*max_mapping_size)(struct device *dev);
@@ -258,9 +256,6 @@ void *dma_alloc_pages(struct device *dev, size_t size,
dma_addr_t *dma_handle,
 		enum dma_data_direction dir, gfp_t gfp);
 void dma_free_pages(struct device *dev, size_t size, void *vaddr,
 		dma_addr_t dma_handle, enum dma_data_direction dir);
-/* dma_cache_sync is deprecated: don't use in new code */
-void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		enum dma_data_direction dir);
 int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs);
@@ -353,10 +348,6 @@ static inline void dma_free_pages(struct device *dev,
size_t size, void *vaddr,
 		dma_addr_t dma_handle, enum dma_data_direction dir)
 {
 }
-static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		enum dma_data_direction dir)
-{
-}
 static inline int dma_get_sgtable_attrs(struct device *dev,
 		struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr,
 		size_t size, unsigned long attrs)
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index 1eecfd24d434f8..e61283e06576a8 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -59,16 +59,6 @@ static inline pgprot_t dma_pgprot(struct device *dev,
pgprot_t prot,
 }
 #endif /* CONFIG_MMU */
 
-#ifdef CONFIG_DMA_NONCOHERENT_CACHE_SYNC
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		enum dma_data_direction direction);
-#else
-static inline void arch_dma_cache_sync(struct device *dev, void *vaddr,
-		size_t size, enum dma_data_direction direction)
-{
-}
-#endif /* CONFIG_DMA_NONCOHERENT_CACHE_SYNC */
-
 #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE
 void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
 		enum dma_data_direction dir);
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 6cf7f7947ae797..98417f8eef92b2 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -74,9 +74,6 @@ config ARCH_HAS_DMA_PREP_COHERENT
 config ARCH_HAS_FORCE_DMA_UNENCRYPTED
 	bool
 
-config DMA_NONCOHERENT_CACHE_SYNC
-	bool
-
 config DMA_VIRT_OPS
 	bool
 	depends on HAS_DMA
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index dacdb7226caacd..81b0492332d4c8 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -555,20 +555,6 @@ int dma_set_coherent_mask(struct device *dev, u64 mask)
 EXPORT_SYMBOL(dma_set_coherent_mask);
 #endif
 
-void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		enum dma_data_direction dir)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	if (dma_alloc_direct(dev, ops))
-		arch_dma_cache_sync(dev, vaddr, size, dir);
-	else if (ops->cache_sync)
-		ops->cache_sync(dev, vaddr, size, dir);
-}
-EXPORT_SYMBOL(dma_cache_sync);
-
 size_t dma_max_mapping_size(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 26/28] dmapool: add dma_alloc_pages support
Add an new variant of a dmapool that uses non-coherent memory from
dma_alloc_pages.  Unlike the existing mempool_create this one
initialized a pool allocated by the caller to avoid a pointless extra
allocation.  At some point it might be worth to also switch the coherent
allocation over to a similar dma_pool_init_coherent helper, but that is
better done as a separate series including a few conversions.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 include/linux/dmapool.h |  23 ++++-
 mm/dmapool.c            | 211 +++++++++++++++++++++++++---------------
 2 files changed, 154 insertions(+), 80 deletions(-)
diff --git a/include/linux/dmapool.h b/include/linux/dmapool.h
index f632ecfb423840..1387525c4e52e8 100644
--- a/include/linux/dmapool.h
+++ b/include/linux/dmapool.h
@@ -11,6 +11,10 @@
 #ifndef LINUX_DMAPOOL_H
 #define	LINUX_DMAPOOL_H
 
+#include <linux/dma-direction.h>
+#include <linux/gfp.h>
+#include <linux/spinlock.h>
+#include <linux/list.h>
 #include <linux/scatterlist.h>
 #include <asm/io.h>
 
@@ -18,11 +22,28 @@ struct device;
 
 #ifdef CONFIG_HAS_DMA
 
+struct dma_pool {		/* the pool */
+	struct list_head page_list;
+	spinlock_t lock;
+	size_t size;
+	struct device *dev;
+	size_t allocation;
+	size_t boundary;
+	bool is_coherent;
+	enum dma_data_direction dir;
+	char name[32];
+	struct list_head pools;
+};
+
 struct dma_pool *dma_pool_create(const char *name, struct device *dev, 
 			size_t size, size_t align, size_t allocation);
-
 void dma_pool_destroy(struct dma_pool *pool);
 
+int dma_pool_init(struct device *dev, struct dma_pool *pool, const char *name,
+		size_t size, size_t align, size_t boundary,
+		enum dma_data_direction dir);
+void dma_pool_exit(struct dma_pool *pool);
+
 void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 		     dma_addr_t *handle);
 void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t addr);
diff --git a/mm/dmapool.c b/mm/dmapool.c
index f9fb9bbd733e0f..c60a48b22c8d6a 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -6,10 +6,10 @@
  * Copyright 2007 Intel Corporation
  *   Author: Matthew Wilcox <willy at linux.intel.com>
  *
- * This allocator returns small blocks of a given size which are DMA-able by
- * the given device.  It uses the dma_alloc_coherent page allocator to get
- * new pages, then splits them up into blocks of the required size.
- * Many older drivers still have their own code to do this.
+ * This allocator returns small blocks of a given size which are DMA-able by
the
+ * given device.  It either uses the dma_alloc_coherent or the dma_alloc_pages
+ * allocator to get new pages, then splits them up into blocks of the required
+ * size.
  *
  * The current design of this allocator is fairly simple.  The pool is
  * represented by the 'struct dma_pool' which keeps a doubly-linked
list of
@@ -39,17 +39,6 @@
 #define DMAPOOL_DEBUG 1
 #endif
 
-struct dma_pool {		/* the pool */
-	struct list_head page_list;
-	spinlock_t lock;
-	size_t size;
-	struct device *dev;
-	size_t allocation;
-	size_t boundary;
-	char name[32];
-	struct list_head pools;
-};
-
 struct dma_page {		/* cacheable header for 'allocation' bytes */
 	struct list_head page_list;
 	void *vaddr;
@@ -104,74 +93,40 @@ show_pools(struct device *dev, struct device_attribute
*attr, char *buf)
 
 static DEVICE_ATTR(pools, 0444, show_pools, NULL);
 
-/**
- * dma_pool_create - Creates a pool of consistent memory blocks, for dma.
- * @name: name of pool, for diagnostics
- * @dev: device that will be doing the DMA
- * @size: size of the blocks in this pool.
- * @align: alignment requirement for blocks; must be a power of two
- * @boundary: returned blocks won't cross this power of two boundary
- * Context: not in_interrupt()
- *
- * Given one of these pools, dma_pool_alloc()
- * may be used to allocate memory.  Such memory will all have
"consistent"
- * DMA mappings, accessible by the device and its driver without using
- * cache flushing primitives.  The actual size of blocks allocated may be
- * larger than requested because of alignment.
- *
- * If @boundary is nonzero, objects returned from dma_pool_alloc() won't
- * cross that size boundary.  This is useful for devices which have
- * addressing restrictions on individual DMA transfers, such as not crossing
- * boundaries of 4KBytes.
- *
- * Return: a dma allocation pool with the requested characteristics, or
- * %NULL if one can't be created.
- */
-struct dma_pool *dma_pool_create(const char *name, struct device *dev,
-				 size_t size, size_t align, size_t boundary)
+static int __dma_pool_init(struct device *dev, struct dma_pool *pool,
+		const char *name, size_t size, size_t align, size_t boundary)
 {
-	struct dma_pool *retval;
 	size_t allocation;
 	bool empty = false;
 
 	if (align == 0)
 		align = 1;
-	else if (align & (align - 1))
-		return NULL;
+	if (align & (align - 1))
+		return -EINVAL;
 
 	if (size == 0)
-		return NULL;
-	else if (size < 4)
-		size = 4;
-
-	size = ALIGN(size, align);
+		return -EINVAL;
+	size = ALIGN(min_t(size_t, size, 4), align);
 	allocation = max_t(size_t, size, PAGE_SIZE);
 
 	if (!boundary)
 		boundary = allocation;
-	else if ((boundary < size) || (boundary & (boundary - 1)))
-		return NULL;
-
-	retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, dev_to_node(dev));
-	if (!retval)
-		return retval;
-
-	strlcpy(retval->name, name, sizeof(retval->name));
-
-	retval->dev = dev;
-
-	INIT_LIST_HEAD(&retval->page_list);
-	spin_lock_init(&retval->lock);
-	retval->size = size;
-	retval->boundary = boundary;
-	retval->allocation = allocation;
-
-	INIT_LIST_HEAD(&retval->pools);
+	if (boundary < size || (boundary & (boundary - 1)))
+		return -EINVAL;
+
+	strlcpy(pool->name, name, sizeof(pool->name));
+	pool->dev = dev;
+	INIT_LIST_HEAD(&pool->page_list);
+	spin_lock_init(&pool->lock);
+	pool->size = size;
+	pool->boundary = boundary;
+	pool->allocation = allocation;
+	INIT_LIST_HEAD(&pool->pools);
 
 	/*
 	 * pools_lock ensures that the ->dma_pools list does not get corrupted.
 	 * pools_reg_lock ensures that there is not a race between
-	 * dma_pool_create() and dma_pool_destroy() or within dma_pool_create()
+	 * __dma_pool_init() and dma_pool_exit() or within dma_pool_create()
 	 * when the first invocation of dma_pool_create() failed on
 	 * device_create_file() and the second assumes that it has been done (I
 	 * know it is a short window).
@@ -180,7 +135,7 @@ struct dma_pool *dma_pool_create(const char *name, struct
device *dev,
 	mutex_lock(&pools_lock);
 	if (list_empty(&dev->dma_pools))
 		empty = true;
-	list_add(&retval->pools, &dev->dma_pools);
+	list_add(&pool->pools, &dev->dma_pools);
 	mutex_unlock(&pools_lock);
 	if (empty) {
 		int err;
@@ -188,18 +143,94 @@ struct dma_pool *dma_pool_create(const char *name, struct
device *dev,
 		err = device_create_file(dev, &dev_attr_pools);
 		if (err) {
 			mutex_lock(&pools_lock);
-			list_del(&retval->pools);
+			list_del(&pool->pools);
 			mutex_unlock(&pools_lock);
 			mutex_unlock(&pools_reg_lock);
-			kfree(retval);
-			return NULL;
+			return err;
 		}
 	}
 	mutex_unlock(&pools_reg_lock);
-	return retval;
+	return 0;
+}
+
+/**
+ * dma_pool_create - Creates a pool of consistent memory blocks, for dma.
+ * @name: name of pool, for diagnostics
+ * @dev: device that will be doing the DMA
+ * @size: size of the blocks in this pool.
+ * @align: alignment requirement for blocks; must be a power of two
+ * @boundary: returned blocks won't cross this power of two boundary
+ * Context: not in_interrupt()
+ *
+ * Given one of these pools, dma_pool_alloc()
+ * may be used to allocate memory.  Such memory will all have
"consistent"
+ * DMA mappings, accessible by the device and its driver without using
+ * cache flushing primitives.  The actual size of blocks allocated may be
+ * larger than requested because of alignment.
+ *
+ * If @boundary is nonzero, objects returned from dma_pool_alloc() won't
+ * cross that size boundary.  This is useful for devices which have
+ * addressing restrictions on individual DMA transfers, such as not crossing
+ * boundaries of 4KBytes.
+ *  Return: a dma allocation pool with the requested characteristics, or
+ * %NULL if one can't be created.
+ */
+struct dma_pool *dma_pool_create(const char *name, struct device *dev,
+				 size_t size, size_t align, size_t boundary)
+{
+	struct dma_pool *pool;
+
+	pool = kmalloc_node(sizeof(*pool), GFP_KERNEL, dev_to_node(dev));
+	if (!pool)
+		return NULL;
+	if (__dma_pool_init(dev, pool, name, size, align, boundary))
+		goto out_free_pool;
+	pool->is_coherent = true;
+	return pool;
+out_free_pool:
+	kfree(pool);
+	return NULL;
 }
 EXPORT_SYMBOL(dma_pool_create);
 
+/**
+ * dma_pool_init - initialize a pool DMA addressable memory
+ * @dev:	device that will be doing the DMA
+ * @pool:	pool to initialize
+ * @name:	name of pool, for diagnostics
+ * @size:	size of the blocks in this pool.
+ * @align:	alignment requirement for blocks; must be a power of two
+ * @boundary:	returned blocks won't cross this power of two boundary
+ * @dir:	DMA direction the allocations are going to be used for
+ *
+ * Context:	not in_interrupt()
+ *
+ * Given one of these pools, dma_pool_alloc() may be used to allocate memory.
+ * Such memory will have the same semantics as memory returned from
+ * dma_alloc_pages(), that is ownership needs to be transferred to and from the
+ * device.  The actual size of blocks allocated may be larger than requested
+ * because of alignment.
+ *
+ * If @boundary is nonzero, objects returned from dma_pool_alloc() won't
+ * cross that size boundary.  This is useful for devices which have
+ * addressing restrictions on individual DMA transfers, such as not crossing
+ * boundaries of 4KBytes.
+ */
+int dma_pool_init(struct device *dev, struct dma_pool *pool, const char *name,
+		size_t size, size_t align, size_t boundary,
+		enum dma_data_direction dir)
+{
+	int ret;
+
+	ret = __dma_pool_init(dev, pool, name, size, align, boundary);
+	if (ret)
+		return ret;
+	pool->is_coherent = false;
+	pool->dir = dir;
+	return 0;
+}
+EXPORT_SYMBOL(dma_pool_init);
+
 static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
 {
 	unsigned int offset = 0;
@@ -223,8 +254,12 @@ static struct dma_page *pool_alloc_page(struct dma_pool
*pool, gfp_t mem_flags)
 	page = kmalloc(sizeof(*page), mem_flags);
 	if (!page)
 		return NULL;
-	page->vaddr = dma_alloc_coherent(pool->dev, pool->allocation,
-					 &page->dma, mem_flags);
+	if (pool->is_coherent)
+		page->vaddr = dma_alloc_coherent(pool->dev, pool->allocation,
+						 &page->dma, mem_flags);
+	else
+		page->vaddr = dma_alloc_pages(pool->dev, pool->allocation,
+					      &page->dma, pool->dir, mem_flags);
 	if (page->vaddr) {
 #ifdef	DMAPOOL_DEBUG
 		memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
@@ -251,20 +286,25 @@ static void pool_free_page(struct dma_pool *pool, struct
dma_page *page)
 #ifdef	DMAPOOL_DEBUG
 	memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
 #endif
-	dma_free_coherent(pool->dev, pool->allocation, page->vaddr, dma);
+	if (pool->is_coherent)
+		dma_free_coherent(pool->dev, pool->allocation, page->vaddr,
+				  dma);
+	else
+		dma_free_pages(pool->dev, pool->allocation, page->vaddr, dma,
+			       pool->dir);
 	list_del(&page->page_list);
 	kfree(page);
 }
 
 /**
- * dma_pool_destroy - destroys a pool of dma memory blocks.
+ * dma_pool_exit - destroys a pool of dma memory blocks.
  * @pool: dma pool that will be destroyed
  * Context: !in_interrupt()
  *
- * Caller guarantees that no more memory from the pool is in use,
- * and that nothing will try to use the pool after this call.
+ * Caller guarantees that no more memory from the pool is in use, and that
+ * nothing will try to use the pool after this call.
  */
-void dma_pool_destroy(struct dma_pool *pool)
+void dma_pool_exit(struct dma_pool *pool)
 {
 	bool empty = false;
 
@@ -299,7 +339,20 @@ void dma_pool_destroy(struct dma_pool *pool)
 		} else
 			pool_free_page(pool, page);
 	}
+}
+EXPORT_SYMBOL(dma_pool_exit);
 
+/**
+ * dma_pool_destroy - destroys a pool of dma memory blocks.
+ * @pool: dma pool that will be destroyed
+ * Context: !in_interrupt()
+ *
+ * Caller guarantees that no more memory from the pool is in use,
+ * and that nothing will try to use the pool after this call.
+ */
+void dma_pool_destroy(struct dma_pool *pool)
+{
+	dma_pool_exit(pool);
 	kfree(pool);
 }
 EXPORT_SYMBOL(dma_pool_destroy);
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 27/28] nvme-pci: fix PRP pool size
All operations are based on the controller, not the host page size.
Switch the dma pool to use the controller page size as well to avoid
massive overallocations on large page size systems.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ba725ae47305ef..a33adab62acbaf 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2460,7 +2460,8 @@ static int nvme_disable_prepare_reset(struct nvme_dev
*dev, bool shutdown)
 static int nvme_setup_prp_pools(struct nvme_dev *dev)
 {
 	dev->prp_page_pool = dma_pool_create("prp list page",
dev->dev,
-						PAGE_SIZE, PAGE_SIZE, 0);
+						NVME_CTRL_PAGE_SIZE,
+						NVME_CTRL_PAGE_SIZE, 0);
 	if (!dev->prp_page_pool)
 		return -ENOMEM;
 
-- 
2.28.0
Christoph Hellwig
2020-Aug-19  06:55 UTC
[Nouveau] [PATCH 28/28] nvme-pci: use dma_alloc_pages backed dmapools
Switch from coherent DMA pools to those backed by dma_alloc_pages.  This
helps device with non-coherent DMA to avoid host accesses to uncached
memory for every submission of a larger than single entry I/O.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 80 ++++++++++++++++++++---------------------
 1 file changed, 40 insertions(+), 40 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a33adab62acbaf..fb34dbcb973673 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -114,8 +114,8 @@ struct nvme_dev {
 	struct blk_mq_tag_set admin_tagset;
 	u32 __iomem *dbs;
 	struct device *dev;
-	struct dma_pool *prp_page_pool;
-	struct dma_pool *prp_small_pool;
+	struct dma_pool prp_page_pool;
+	struct dma_pool prp_small_pool;
 	unsigned online_queues;
 	unsigned max_qid;
 	unsigned io_queues[HCTX_MAX_TYPES];
@@ -536,7 +536,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct
request *req)
 
 
 	if (iod->npages == 0)
-		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
+		dma_pool_free(&dev->prp_small_pool, nvme_pci_iod_list(req)[0],
 			dma_addr);
 
 	for (i = 0; i < iod->npages; i++) {
@@ -553,7 +553,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct
request *req)
 			next_dma_addr = le64_to_cpu(prp_list[last_prp]);
 		}
 
-		dma_pool_free(dev->prp_page_pool, addr, dma_addr);
+		dma_pool_free(&dev->prp_page_pool, addr, dma_addr);
 		dma_addr = next_dma_addr;
 	}
 
@@ -611,10 +611,10 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev
*dev,
 
 	nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE);
 	if (nprps <= (256 / 8)) {
-		pool = dev->prp_small_pool;
+		pool = &dev->prp_small_pool;
 		iod->npages = 0;
 	} else {
-		pool = dev->prp_page_pool;
+		pool = &dev->prp_page_pool;
 		iod->npages = 1;
 	}
 
@@ -630,6 +630,11 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev
*dev,
 	for (;;) {
 		if (i == NVME_CTRL_PAGE_SIZE >> 3) {
 			__le64 *old_prp_list = prp_list;
+
+			dma_sync_single_for_device(dev->dev, prp_dma,
+						   i * sizeof(*prp_list),
+						   DMA_TO_DEVICE);
+
 			prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
 			if (!prp_list)
 				return BLK_STS_RESOURCE;
@@ -653,6 +658,8 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev
*dev,
 		dma_len = sg_dma_len(sg);
 	}
 
+	dma_sync_single_for_device(dev->dev, prp_dma, i * sizeof(*prp_list),
+				   DMA_TO_DEVICE);
 done:
 	cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sg));
 	cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma);
@@ -706,10 +713,10 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev
*dev,
 	}
 
 	if (entries <= (256 / sizeof(struct nvme_sgl_desc))) {
-		pool = dev->prp_small_pool;
+		pool = &dev->prp_small_pool;
 		iod->npages = 0;
 	} else {
-		pool = dev->prp_page_pool;
+		pool = &dev->prp_page_pool;
 		iod->npages = 1;
 	}
 
@@ -728,6 +735,10 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev
*dev,
 		if (i == SGES_PER_PAGE) {
 			struct nvme_sgl_desc *old_sg_desc = sg_list;
 			struct nvme_sgl_desc *link = &old_sg_desc[i - 1];
+	
+			dma_sync_single_for_device(dev->dev, sgl_dma,
+						   i * sizeof(*sg_list),
+						   DMA_TO_DEVICE);
 
 			sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &sgl_dma);
 			if (!sg_list)
@@ -743,6 +754,8 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev
*dev,
 		sg = sg_next(sg);
 	} while (--entries > 0);
 
+	dma_sync_single_for_device(dev->dev, sgl_dma, i * sizeof(*sg_list),
+				   DMA_TO_DEVICE);
 	return BLK_STS_OK;
 }
 
@@ -2457,30 +2470,6 @@ static int nvme_disable_prepare_reset(struct nvme_dev
*dev, bool shutdown)
 	return 0;
 }
 
-static int nvme_setup_prp_pools(struct nvme_dev *dev)
-{
-	dev->prp_page_pool = dma_pool_create("prp list page",
dev->dev,
-						NVME_CTRL_PAGE_SIZE,
-						NVME_CTRL_PAGE_SIZE, 0);
-	if (!dev->prp_page_pool)
-		return -ENOMEM;
-
-	/* Optimisation for I/Os between 4k and 128k */
-	dev->prp_small_pool = dma_pool_create("prp list 256",
dev->dev,
-						256, 256, 0);
-	if (!dev->prp_small_pool) {
-		dma_pool_destroy(dev->prp_page_pool);
-		return -ENOMEM;
-	}
-	return 0;
-}
-
-static void nvme_release_prp_pools(struct nvme_dev *dev)
-{
-	dma_pool_destroy(dev->prp_page_pool);
-	dma_pool_destroy(dev->prp_small_pool);
-}
-
 static void nvme_free_tagset(struct nvme_dev *dev)
 {
 	if (dev->tagset.tags)
@@ -2851,10 +2840,6 @@ static int nvme_probe(struct pci_dev *pdev, const struct
pci_device_id *id)
 	INIT_WORK(&dev->remove_work, nvme_remove_dead_ctrl_work);
 	mutex_init(&dev->shutdown_lock);
 
-	result = nvme_setup_prp_pools(dev);
-	if (result)
-		goto unmap;
-
 	quirks |= check_vendor_combination_bug(pdev);
 
 	if (!noacpi && nvme_acpi_storage_d3(pdev)) {
@@ -2867,6 +2852,18 @@ static int nvme_probe(struct pci_dev *pdev, const struct
pci_device_id *id)
 		quirks |= NVME_QUIRK_SIMPLE_SUSPEND;
 	}
 
+	result = dma_pool_init(dev->dev, &dev->prp_page_pool, "prp list
page",
+			NVME_CTRL_PAGE_SIZE, NVME_CTRL_PAGE_SIZE, 0,
+			DMA_TO_DEVICE);
+	if (result)
+		goto unmap;
+
+	/* Optimisation for I/Os between 4k and 128k */
+	result = dma_pool_init(dev->dev, &dev->prp_small_pool, "prp
list 256",
+			256, 256, 0, DMA_TO_DEVICE);
+	if (result)
+		goto release_prp_page_pool;
+
 	/*
 	 * Double check that our mempool alloc size will cover the biggest
 	 * command we support.
@@ -2880,7 +2877,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct
pci_device_id *id)
 						GFP_KERNEL, node);
 	if (!dev->iod_mempool) {
 		result = -ENOMEM;
-		goto release_pools;
+		goto release_prp_small_pool;
 	}
 
 	result = nvme_init_ctrl(&dev->ctrl, &pdev->dev,
&nvme_pci_ctrl_ops,
@@ -2897,8 +2894,10 @@ static int nvme_probe(struct pci_dev *pdev, const struct
pci_device_id *id)
 
  release_mempool:
 	mempool_destroy(dev->iod_mempool);
- release_pools:
-	nvme_release_prp_pools(dev);
+ release_prp_small_pool:
+	dma_pool_exit(&dev->prp_small_pool);
+ release_prp_page_pool:
+	dma_pool_exit(&dev->prp_page_pool);
  unmap:
 	nvme_dev_unmap(dev);
  put_pci:
@@ -2963,7 +2962,8 @@ static void nvme_remove(struct pci_dev *pdev)
 	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
 	nvme_free_queues(dev, 0);
-	nvme_release_prp_pools(dev);
+	dma_pool_exit(&dev->prp_small_pool);
+	dma_pool_exit(&dev->prp_page_pool);
 	nvme_dev_unmap(dev);
 	nvme_uninit_ctrl(&dev->ctrl);
 }
-- 
2.28.0
Tomasz Figa
2020-Aug-19  11:16 UTC
[Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
Hi Christoph, On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch at lst.de> wrote:> > The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,Could you explain what makes you think it's unused? It's a feature of the UAPI generally supported by the videobuf2 framework and relied on by Chromium OS to get any kind of reasonable performance when accessing V4L2 buffers in the userspace.> and causes > weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is > unimplemented except on PARISC and some MIPS configs, and about to be > removed.It is implemented by the generic DMA mapping layer [1], which is used by a number of architectures including ARM64 and supposed to be used by new architectures going forward. [1] https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341 When removing features from generic kernel code, I'd suggest first providing viable alternatives for its users, rather than killing the users altogether. Given the above, I'm afraid I have to NAK this. Best regards, Tomasz> > Signed-off-by: Christoph Hellwig <hch at lst.de> > --- > .../userspace-api/media/v4l/buffer.rst | 17 --------- > .../media/v4l/vidioc-reqbufs.rst | 1 - > .../media/common/videobuf2/videobuf2-core.c | 36 +------------------ > .../common/videobuf2/videobuf2-dma-contig.c | 19 ---------- > .../media/common/videobuf2/videobuf2-dma-sg.c | 3 +- > .../media/common/videobuf2/videobuf2-v4l2.c | 12 ------- > include/media/videobuf2-core.h | 3 +- > include/uapi/linux/videodev2.h | 2 -- > 8 files changed, 3 insertions(+), 90 deletions(-) > > diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst > index 57e752aaf414a7..2044ed13cd9d7d 100644 > --- a/Documentation/userspace-api/media/v4l/buffer.rst > +++ b/Documentation/userspace-api/media/v4l/buffer.rst > @@ -701,23 +701,6 @@ Memory Consistency Flags > :stub-columns: 0 > :widths: 3 1 4 > > - * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`: > - > - - ``V4L2_FLAG_MEMORY_NON_CONSISTENT`` > - - 0x00000001 > - - A buffer is allocated either in consistent (it will be automatically > - coherent between the CPU and the bus) or non-consistent memory. The > - latter can provide performance gains, for instance the CPU cache > - sync/flush operations can be avoided if the buffer is accessed by the > - corresponding device only and the CPU does not read/write to/from that > - buffer. However, this requires extra care from the driver -- it must > - guarantee memory consistency by issuing a cache flush/sync when > - consistency is needed. If this flag is set V4L2 will attempt to > - allocate the buffer in non-consistent memory. The flag takes effect > - only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the > - queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS > - <V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability. > - > .. c:type:: v4l2_memory > > enum v4l2_memory > diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst > index 75d894d9c36c42..3180c111d368ee 100644 > --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst > +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst > @@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit > - This capability is set by the driver to indicate that the queue supports > cache and memory management hints. However, it's only valid when the > queue is used for :ref:`memory mapping <mmap>` streaming I/O. See > - :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT <V4L2-FLAG-MEMORY-NON-CONSISTENT>`, > :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and > :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`. > > diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c > index f544d3393e9d6b..66a41cef33c1b1 100644 > --- a/drivers/media/common/videobuf2/videobuf2-core.c > +++ b/drivers/media/common/videobuf2/videobuf2-core.c > @@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q, > } > EXPORT_SYMBOL(vb2_verify_memory_type); > > -static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem) > -{ > - q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT; > - > - if (!vb2_queue_allows_cache_hints(q)) > - return; > - if (!consistent_mem) > - q->dma_attrs |= DMA_ATTR_NON_CONSISTENT; > -} > - > -static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem) > -{ > - bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT); > - > - if (consistent_mem != queue_is_consistent) { > - dprintk(q, 1, "memory consistency model mismatch\n"); > - return false; > - } > - return true; > -} > - > int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, > unsigned int flags, unsigned int *count) > { > unsigned int num_buffers, allocated_buffers, num_planes = 0; > unsigned plane_sizes[VB2_MAX_PLANES] = { }; > - bool consistent_mem = true; > unsigned int i; > int ret; > > - if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) > - consistent_mem = false; > - > if (q->streaming) { > dprintk(q, 1, "streaming active\n"); > return -EBUSY; > @@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, > } > > if (*count == 0 || q->num_buffers != 0 || > - (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) || > - !verify_consistency_attr(q, consistent_mem)) { > + (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) { > /* > * We already have buffers allocated, so first check if they > * are not in use and can be freed. > @@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, > num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME); > memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); > q->memory = memory; > - set_queue_consistency(q, consistent_mem); > > /* > * Ask the driver how many buffers and planes per buffer it requires. > @@ -894,12 +867,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, > { > unsigned int num_planes = 0, num_buffers, allocated_buffers; > unsigned plane_sizes[VB2_MAX_PLANES] = { }; > - bool consistent_mem = true; > int ret; > > - if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) > - consistent_mem = false; > - > if (q->num_buffers == VB2_MAX_FRAME) { > dprintk(q, 1, "maximum number of buffers already allocated\n"); > return -ENOBUFS; > @@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, > } > memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); > q->memory = memory; > - set_queue_consistency(q, consistent_mem); > q->waiting_for_buffers = !q->is_output; > } else { > if (q->memory != memory) { > dprintk(q, 1, "memory model mismatch\n"); > return -EINVAL; > } > - if (!verify_consistency_attr(q, consistent_mem)) > - return -EINVAL; > } > > num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers); > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > index ec3446cc45b8da..7b1b86ec942d7d 100644 > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > @@ -42,11 +42,6 @@ struct vb2_dc_buf { > struct dma_buf_attachment *db_attach; > }; > > -static inline bool vb2_dc_buffer_consistent(unsigned long attr) > -{ > - return !(attr & DMA_ATTR_NON_CONSISTENT); > -} > - > /*********************************************/ > /* scatterlist table functions */ > /*********************************************/ > @@ -341,13 +336,6 @@ static int > vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, > enum dma_data_direction direction) > { > - struct vb2_dc_buf *buf = dbuf->priv; > - struct sg_table *sgt = buf->dma_sgt; > - > - if (vb2_dc_buffer_consistent(buf->attrs)) > - return 0; > - > - dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); > return 0; > } > > @@ -355,13 +343,6 @@ static int > vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, > enum dma_data_direction direction) > { > - struct vb2_dc_buf *buf = dbuf->priv; > - struct sg_table *sgt = buf->dma_sgt; > - > - if (vb2_dc_buffer_consistent(buf->attrs)) > - return 0; > - > - dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); > return 0; > } > > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644 > --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c > +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > @@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs, > /* > * NOTE: dma-sg allocates memory using the page allocator directly, so > * there is no memory consistency guarantee, hence dma-sg ignores DMA > - * attributes passed from the upper layer. That means that > - * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers. > + * attributes passed from the upper layer. > */ > buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *), > GFP_KERNEL | __GFP_ZERO); > diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c > index 30caad27281e1a..de83ad48783821 100644 > --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c > +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c > @@ -722,20 +722,11 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps) > #endif > } > > -static void clear_consistency_attr(struct vb2_queue *q, > - int memory, > - unsigned int *flags) > -{ > - if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP) > - *flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT; > -} > - > int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req) > { > int ret = vb2_verify_memory_type(q, req->memory, req->type); > > fill_buf_caps(q, &req->capabilities); > - clear_consistency_attr(q, req->memory, &req->flags); > return ret ? ret : vb2_core_reqbufs(q, req->memory, > req->flags, &req->count); > } > @@ -769,7 +760,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create) > unsigned i; > > fill_buf_caps(q, &create->capabilities); > - clear_consistency_attr(q, create->memory, &create->flags); > create->index = q->num_buffers; > if (create->count == 0) > return ret != -EBUSY ? ret : 0; > @@ -998,7 +988,6 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv, > int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type); > > fill_buf_caps(vdev->queue, &p->capabilities); > - clear_consistency_attr(vdev->queue, p->memory, &p->flags); > if (res) > return res; > if (vb2_queue_is_busy(vdev, file)) > @@ -1021,7 +1010,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv, > > p->index = vdev->queue->num_buffers; > fill_buf_caps(vdev->queue, &p->capabilities); > - clear_consistency_attr(vdev->queue, p->memory, &p->flags); > /* > * If count == 0, then just check if memory and type are valid. > * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0. > diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h > index 52ef92049073e3..4c7f25b07e9375 100644 > --- a/include/media/videobuf2-core.h > +++ b/include/media/videobuf2-core.h > @@ -744,8 +744,7 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb); > * vb2_core_reqbufs() - Initiate streaming. > * @q: pointer to &struct vb2_queue with videobuf2 queue. > * @memory: memory type, as defined by &enum vb2_memory. > - * @flags: auxiliary queue/buffer management flags. Currently, the only > - * used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT. > + * @flags: auxiliary queue/buffer management flags. > * @count: requested buffer count. > * > * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called > diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h > index c7b70ff53bc1dd..5c00f63d9c1b58 100644 > --- a/include/uapi/linux/videodev2.h > +++ b/include/uapi/linux/videodev2.h > @@ -191,8 +191,6 @@ enum v4l2_memory { > V4L2_MEMORY_DMABUF = 4, > }; > > -#define V4L2_FLAG_MEMORY_NON_CONSISTENT (1 << 0) > - > /* see also http://vektor.theorem.ca/graphics/ycbcr/ */ > enum v4l2_colorspace { > /* > -- > 2.28.0 > > _______________________________________________ > iommu mailing list > iommu at lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu
Tomasz Figa
2020-Aug-19  15:03 UTC
[Nouveau] [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages
Hi Christoph, On Wed, Aug 19, 2020 at 8:57 AM Christoph Hellwig <hch at lst.de> wrote:> > Add a new API to allocate and free pages that are guaranteed to be > addressable by a device, but otherwise behave like pages allocated by > alloc_pages. The intended APIs to sync them for use with the device > and cpu are dma_sync_single_for_{device,cpu} that are also used for > streaming mappings. > > Switch all drivers over to this new API, but keep the usage of the > crufty dma_cache_sync API for now, which will be cleaned up on a driver > by driver basis. > > Signed-off-by: Christoph Hellwig <hch at lst.de> > --- > Documentation/core-api/dma-api.rst | 68 +++++++++++------------ > Documentation/core-api/dma-attributes.rst | 8 --- > arch/alpha/kernel/pci_iommu.c | 2 + > arch/arm/mm/dma-mapping-nommu.c | 2 + > arch/arm/mm/dma-mapping.c | 4 ++ > arch/ia64/hp/common/sba_iommu.c | 2 + > arch/mips/jazz/jazzdma.c | 7 +-- > arch/powerpc/kernel/dma-iommu.c | 2 + > arch/powerpc/platforms/ps3/system-bus.c | 4 ++ > arch/powerpc/platforms/pseries/vio.c | 2 + > arch/s390/pci/pci_dma.c | 2 + > arch/x86/kernel/amd_gart_64.c | 2 + > drivers/iommu/dma-iommu.c | 2 + > drivers/iommu/intel/iommu.c | 4 ++ > drivers/net/ethernet/i825xx/lasi_82596.c | 13 ++--- > drivers/net/ethernet/seeq/sgiseeq.c | 12 ++-- > drivers/parisc/ccio-dma.c | 2 + > drivers/parisc/sba_iommu.c | 2 + > drivers/scsi/53c700.c | 8 +-- > drivers/scsi/sgiwd93.c | 12 ++-- > drivers/xen/swiotlb-xen.c | 2 + > include/linux/dma-direct.h | 5 ++ > include/linux/dma-mapping.h | 29 ++++++++-- > include/linux/dma-noncoherent.h | 3 - > kernel/dma/direct.c | 51 ++++++++++++++++- > kernel/dma/mapping.c | 43 +++++++++++++- > kernel/dma/ops_helpers.c | 35 ++++++++++++ > kernel/dma/virt.c | 2 + > sound/mips/hal2.c | 20 +++---- > 29 files changed, 254 insertions(+), 96 deletions(-) >Thanks for the patch. The general design looks quite nice, but please see my comments inline.> diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst > index 90239348b30f6f..047fcfffa0e5cf 100644 > --- a/Documentation/core-api/dma-api.rst > +++ b/Documentation/core-api/dma-api.rst > @@ -516,48 +516,53 @@ routines, e.g.::: > } > > > -Part II - Advanced dma usage > ----------------------------- > +Part II - Non-coherent DMA allocations > +-------------------------------------- > > -Warning: These pieces of the DMA API should not be used in the > -majority of cases, since they cater for unlikely corner cases that > -don't belong in usual drivers. > +These APIs allow to allocate pages that can be used like normal pages > +in the kernel direct mapping, but are guaranteed to be DMA addressable.Could we elaborate a bit more on what "like normal pages in kernel direct mapping" mean from the driver perspective?> > If you don't understand how cache line coherency works between a > processor and an I/O device, you should not be using this part of the > -API at all. > +API. > > :: > > void * > - dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle, > - gfp_t flag, unsigned long attrs) > + dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle, > + enum dma_data_direction dir, gfp_t gfp) > + > +This routine allocates a region of <size> bytes of consistent memory. It > +returns a pointer to the allocated region (in the processor's virtual address > +space) or NULL if the allocation failed. The returned memory is guanteed to > +behave like memory allocated using alloc_pages.There is one aspect that the existing dma_alloc_attrs() handles, but this new function doesn't: IOMMU support. The function will always allocate a physically-contiguous block memory, which is a costly operation and not even guaranteed to succeed, even if enough free memory is available. Modern SoCs employ IOMMUs to avoid the need to allocate physically-contiguous memory and those happen to be also the devices that could benefit from non-coherent allocations a lot. One of the tasks of the DMA API was making it possible to allocate suitable memory for a given device, without having the driver know about the SoC integration details, such as the presence of an IOMMU. Today, dma_alloc_attrs() uses the .alloc callback of the dma_ops struct and the IOMMU-aware implementations, like the dma-iommu helpers [1], would allocate discontiguous pages. Therefore, while I see the DMA-aware page allocation functionality as a useful functionality on its own for scatter-gather-capable hardware, I believe it is not a complete replacement for dma_alloc_attrs() with the DMA_ATTR_NON_CONSISTENT attribute. [1] https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/iommu/dma-iommu.c#L510 Best regards, Tomasz
Marek Szyprowski
2020-Aug-25  11:30 UTC
[Nouveau] a saner API for allocating DMA addressable pages
Hi Christoph, On 19.08.2020 08:55, Christoph Hellwig wrote:> this series replaced the DMA_ATTR_NON_CONSISTENT flag to dma_alloc_attrs > with a separate new dma_alloc_pages API, which is available on all > platforms. In addition to cleaning up the convoluted code path, this > ensures that other drivers that have asked for better support for > non-coherent DMA to pages with incurring bounce buffering over can finally > be properly supported. > > I'm still a little unsure about the API naming, as alloc_pages sort of > implies a struct page return value, but we return a kernel virtual > address. The other alternative would be to name the API > dma_alloc_noncoherent, but the whole non-coherent naming seems to put > people off. As a follow up I plan to move the implementation of the > DMA_ATTR_NO_KERNEL_MAPPING flag over to this framework as well, given > that is also is a fundamentally non coherent allocation. The replacement > for that flag would then return a struct page, as it is allowed to > actually return pages without a kernel mapping as the name suggested > (although most of the time they will actually have a kernel mapping..) > > In addition to the conversions of the existing non-coherent DMA users > the last three patches also convert the DMA coherent allocations in > the NVMe driver to use this new framework through a dmapool addition. > This was both to give me a good testing vehicle, but also because it > should speed up the NVMe driver on platforms with non-coherent DMA > nicely, without a downside on platforms with cache coherent DMA.I really wonder what is the difference between this new API and alloc_pages(GFP_DMA, n). Is this API really needed? I thought that this is legacy thing to be removed one day... Maybe it would make more sense to convert the few remaining drivers to regular dma_map_page()/dma_sync_*()/dma_unmap_page() or have I missed something? Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland
Christoph Hellwig
2020-Aug-25  13:26 UTC
[Nouveau] a saner API for allocating DMA addressable pages
On Tue, Aug 25, 2020 at 01:30:41PM +0200, Marek Szyprowski wrote:> I really wonder what is the difference between this new API and > alloc_pages(GFP_DMA, n). Is this API really needed? I thought that this > is legacy thing to be removed one day...The difference is that the pages returned are guranteed to be addressable by the devie. This is a very important difference that matters for a lot of use cases.
Helge Deller
2020-Aug-29  09:46 UTC
[Nouveau] a saner API for allocating DMA addressable pages
Hi Christoph, On 19.08.20 08:55, Christoph Hellwig wrote:> this series replaced the DMA_ATTR_NON_CONSISTENT flag to dma_alloc_attrs > with a separate new dma_alloc_pages API, which is available on all > platforms. In addition to cleaning up the convoluted code path, this > ensures that other drivers that have asked for better support for > non-coherent DMA to pages with incurring bounce buffering over can finally > be properly supported. > .... > A git tree is available here: > > git://git.infradead.org/users/hch/misc.git dma_alloc_pagesI've tested this tree on my parisc machine which uses the 53c700 and lasi_82596 drivers. Everything worked as expected, so you may add: Tested-by: Helge Deller <deller at gmx.de> # parisc Thanks! Helge> > Gitweb: > > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages > > > Diffstat: > Documentation/core-api/dma-api.rst | 92 ++---- > Documentation/core-api/dma-attributes.rst | 8 > Documentation/userspace-api/media/v4l/buffer.rst | 17 - > Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst | 1 > arch/alpha/kernel/pci_iommu.c | 2 > arch/arm/include/asm/dma-direct.h | 4 > arch/arm/mm/dma-mapping-nommu.c | 2 > arch/arm/mm/dma-mapping.c | 4 > arch/ia64/Kconfig | 3 > arch/ia64/hp/common/sba_iommu.c | 2 > arch/ia64/kernel/dma-mapping.c | 14 > arch/ia64/mm/init.c | 3 > arch/mips/Kconfig | 1 > arch/mips/bmips/dma.c | 4 > arch/mips/cavium-octeon/dma-octeon.c | 4 > arch/mips/include/asm/dma-direct.h | 4 > arch/mips/include/asm/jazzdma.h | 2 > arch/mips/jazz/jazzdma.c | 102 +------ > arch/mips/loongson2ef/fuloong-2e/dma.c | 4 > arch/mips/loongson2ef/lemote-2f/dma.c | 4 > arch/mips/loongson64/dma.c | 4 > arch/mips/mm/dma-noncoherent.c | 48 +-- > arch/mips/pci/pci-ar2315.c | 4 > arch/mips/pci/pci-xtalk-bridge.c | 4 > arch/mips/sgi-ip32/ip32-dma.c | 4 > arch/parisc/Kconfig | 1 > arch/parisc/kernel/pci-dma.c | 6 > arch/powerpc/include/asm/dma-direct.h | 4 > arch/powerpc/kernel/dma-iommu.c | 2 > arch/powerpc/platforms/ps3/system-bus.c | 4 > arch/powerpc/platforms/pseries/vio.c | 2 > arch/s390/pci/pci_dma.c | 2 > arch/x86/kernel/amd_gart_64.c | 8 > drivers/gpu/drm/exynos/exynos_drm_gem.c | 2 > drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 3 > drivers/iommu/dma-iommu.c | 2 > drivers/iommu/intel/iommu.c | 6 > drivers/media/common/videobuf2/videobuf2-core.c | 36 -- > drivers/media/common/videobuf2/videobuf2-dma-contig.c | 19 - > drivers/media/common/videobuf2/videobuf2-dma-sg.c | 3 > drivers/media/common/videobuf2/videobuf2-v4l2.c | 12 > drivers/net/ethernet/amd/au1000_eth.c | 15 - > drivers/net/ethernet/i825xx/lasi_82596.c | 36 +- > drivers/net/ethernet/i825xx/lib82596.c | 148 +++++----- > drivers/net/ethernet/i825xx/sni_82596.c | 23 - > drivers/net/ethernet/seeq/sgiseeq.c | 24 - > drivers/nvme/host/pci.c | 79 ++--- > drivers/parisc/ccio-dma.c | 2 > drivers/parisc/sba_iommu.c | 2 > drivers/scsi/53c700.c | 120 ++++---- > drivers/scsi/53c700.h | 9 > drivers/scsi/sgiwd93.c | 14 > drivers/xen/swiotlb-xen.c | 2 > include/linux/dma-direct.h | 55 ++- > include/linux/dma-mapping.h | 32 +- > include/linux/dma-noncoherent.h | 21 - > include/linux/dmapool.h | 23 + > include/linux/gfp.h | 6 > include/media/videobuf2-core.h | 3 > include/uapi/linux/videodev2.h | 2 > kernel/dma/Kconfig | 9 > kernel/dma/Makefile | 1 > kernel/dma/coherent.c | 17 + > kernel/dma/direct.c | 112 +++++-- > kernel/dma/mapping.c | 104 ++----- > kernel/dma/ops_helpers.c | 86 ++++++ > kernel/dma/pool.c | 2 > kernel/dma/swiotlb.c | 4 > kernel/dma/virt.c | 2 > mm/dmapool.c | 211 +++++++++------ > sound/mips/hal2.c | 58 +--- > 71 files changed, 872 insertions(+), 803 deletions(-) >
Thomas Bogendoerfer
2020-Sep-01  13:29 UTC
[Nouveau] [PATCH 06/28] lib82596: move DMA allocation into the callers of i82596_probe
On Wed, Aug 19, 2020 at 08:55:33AM +0200, Christoph Hellwig wrote:> This allows us to get rid of the LIB82596_DMA_ATTR defined and prepare > for untangling the coherent vs non-coherent DMA allocation API. > > Signed-off-by: Christoph Hellwig <hch at lst.de> > --- > drivers/net/ethernet/i825xx/lasi_82596.c | 24 ++++++++++------ > drivers/net/ethernet/i825xx/lib82596.c | 36 ++++++++---------------- > drivers/net/ethernet/i825xx/sni_82596.c | 19 +++++++++---- > 3 files changed, 40 insertions(+), 39 deletions(-) > > [...] > diff --git a/drivers/net/ethernet/i825xx/sni_82596.c b/drivers/net/ethernet/i825xx/sni_82596.c > index 22f5887578b2bd..e80e790ffbd4d4 100644 > --- a/drivers/net/ethernet/i825xx/sni_82596.c > +++ b/drivers/net/ethernet/i825xx/sni_82596.c > @@ -24,8 +24,6 @@ > > static const char sni_82596_string[] = "snirm_82596"; > > -#define LIB82596_DMA_ATTR 0 > - > #define DMA_WBACK(priv, addr, len) do { } while (0) > #define DMA_INV(priv, addr, len) do { } while (0) > #define DMA_WBACK_INV(priv, addr, len) do { } while (0) > @@ -134,10 +132,19 @@ static int sni_82596_probe(struct platform_device *dev) > lp->ca = ca_addr; > lp->mpu_port = mpu_addr; > > + lp->dma = dma_alloc_coherent(dev->dev.parent, sizeof(struct i596_dma), > + &lp->dma_addr, GFP_KERNEL);this needs to use &dev->dev as device argument otherwise I get a WARNING: CPU: 0 PID: 1 at linux/kernel/dma/mapping.c:416 dma_alloc_attrs+0x64/0x98 (coherent_dma_mask is set correctly). dev->dev.parent was correct when going from netdevice to underlying device, but now allocation is done via platform_device probe. I wonder why this works for parisc. Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea. [ RFC1925, 2.3 ]
Thomas Bogendoerfer
2020-Sep-01  13:49 UTC
[Nouveau] [PATCH 09/28] MIPS/jazzdma: remove the unused vdma_remap function
On Wed, Aug 19, 2020 at 08:55:36AM +0200, Christoph Hellwig wrote:> Signed-off-by: Christoph Hellwig <hch at lst.de> > --- > arch/mips/include/asm/jazzdma.h | 2 - > arch/mips/jazz/jazzdma.c | 70 --------------------------------- > 2 files changed, 72 deletions(-)Acked-by: Thomas Bogendoerfer <tsbogend at alpha.franken.de> -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea. [ RFC1925, 2.3 ]
Thomas Bogendoerfer
2020-Sep-01  13:49 UTC
[Nouveau] [PATCH 10/28] MIPS/jazzdma: decouple from dma-direct
On Wed, Aug 19, 2020 at 08:55:37AM +0200, Christoph Hellwig wrote:> The jazzdma ops implement support for a very basic IOMMU. Thus we really > should not use the dma-direct code that takes physical address limits > into account. This survived through the great MIPS DMA ops cleanup mostly > because I was lazy, but now it is time to fully split the implementations. > > Signed-off-by: Christoph Hellwig <hch at lst.de> > --- > arch/mips/jazz/jazzdma.c | 32 +++++++++++++++++++++----------- > 1 file changed, 21 insertions(+), 11 deletions(-)Acked-by: Thomas Bogendoerfer <tsbogend at alpha.franken.de> -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea. [ RFC1925, 2.3 ]
Thomas Bogendoerfer
2020-Sep-01  13:53 UTC
[Nouveau] [PATCH 08/28] MIPS: make dma_sync_*_for_cpu a little less overzealous
On Wed, Aug 19, 2020 at 08:55:35AM +0200, Christoph Hellwig wrote:> When transferring DMA ownership back to the CPU there should never > be any writeback from the cache, as the buffer was owned by the > device until now. Instead it should just be invalidated for the > mapping directions where the device could have written data. > Note that the changes rely on the fact that kmap_atomic is stubbed > out for the !HIGHMEM case to simplify the code a bit. > > Signed-off-by: Christoph Hellwig <hch at lst.de> > --- > arch/mips/mm/dma-noncoherent.c | 44 +++++++++++++++++++++------------- > 1 file changed, 28 insertions(+), 16 deletions(-)Acked-by: Thomas Bogendoerfer <tsbogend at alpha.franken.de> -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea. [ RFC1925, 2.3 ]
James Bottomley
2020-Sep-01  14:52 UTC
[Nouveau] [PATCH 07/28] 53c700: improve non-coherent DMA handling
On Wed, 2020-08-19 at 08:55 +0200, Christoph Hellwig wrote:> Switch the 53c700 driver to only use non-coherent descriptor memory > if it really has to because dma_alloc_coherent fails. This doesn't > matter for any of the platforms it runs on currently, but that will > change soon. > > To help with this two new helpers to transfer ownership to and from > the device are added that abstract the syncing of the non-coherent > memory. The two current bidirectional cases are mapped to transfers > to the device, as that appears to what they are used for. Note that > for parisc, which is the only architecture this driver needs to use > non-coherent memory on, the direction argument of dma_cache_sync is > ignored, so this will not change behavior in any way.I think this looks mostly OK, except for one misnamed parameter below. Unfortunately, the last non-coherent parisc was the 700 series and I no longer own a box, so I can't test that part of it (I can fire up the C360 to test it on a coherent arch). [...]> diff --git a/drivers/scsi/53c700.h b/drivers/scsi/53c700.h > index 05fe439b66afe5..0f545b05fe611d 100644 > --- a/drivers/scsi/53c700.h > +++ b/drivers/scsi/53c700.h > @@ -209,6 +209,7 @@ struct NCR_700_Host_Parameters { > #endif > __u32 chip710:1; /* set if really a 710 not > 700 */ > __u32 burst_length:4; /* set to 0 to disable > 710 bursting */ > + __u32 noncoherent:1; /* needs to use non- > coherent DMA */ > > /* NOTHING BELOW HERE NEEDS ALTERING */ > __u32 fast:1; /* if we can alter the > SCSI bus clock > @@ -429,7 +430,7 @@ struct NCR_700_Host_Parameters { > for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); > i++) { \ > __u32 val > bS_to_cpu((script)[A_##symbol##_used[i]]) + da; \ > (script)[A_##symbol##_used[i]] = bS_to_host(val); \ > - dma_cache_sync((dev), > &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \ > + dma_sync_to_dev((dev), > &(script)[A_##symbol##_used[i]], 4); \ > DEBUG((" script, patching %s at %d to %pad\n", \ > #symbol, A_##symbol##_used[i], &da)); \ > } \ > @@ -441,7 +442,7 @@ struct NCR_700_Host_Parameters { > dma_addr_t da = value; \ > for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); > i++) { \ > (script)[A_##symbol##_used[i]] = bS_to_host(da); \ > - dma_cache_sync((dev), > &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \ > + dma_sync_to_dev((dev), > &(script)[A_##symbol##_used[i]], 4); \ > DEBUG((" script, patching %s at %d to %pad\n", \ > #symbol, A_##symbol##_used[i], &da)); \ > } \ > @@ -456,7 +457,7 @@ struct NCR_700_Host_Parameters { > val &= 0xff00ffff; \ > val |= ((value) & 0xff) << 16; \ > (script)[A_##symbol##_used[i]] = bS_to_host(val); \ > - dma_cache_sync((dev), > &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \ > + dma_sync_to_dev((dev), > &(script)[A_##symbol##_used[i]], 4); \ > DEBUG((" script, patching ID field %s at %d to > 0x%x\n", \ > #symbol, A_##symbol##_used[i], val)); \ > } \ > @@ -470,7 +471,7 @@ struct NCR_700_Host_Parameters { > val &= 0xffff0000; \ > val |= ((value) & 0xffff); \ > (script)[A_##symbol##_used[i]] = bS_to_host(val); \ > - dma_cache_sync((dev), > &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \ > + dma_sync_to_dev((dev), > &(script)[A_##symbol##_used[i]], 4); \ > DEBUG((" script, patching short field %s at %d to > 0x%x\n", \ > #symbol, A_##symbol##_used[i], val)); \ > } \These macro arguments need updating. Since you changed the input from hostdata->dev to hostdata, leaving the macro argument as dev is simply misleading. It needs to become hostdata or h. James
Thomas Bogendoerfer
2020-Sep-01  15:22 UTC
[Nouveau] [PATCH 22/28] sgiseeq: convert from dma_cache_sync to dma_sync_single_for_device
On Wed, Aug 19, 2020 at 08:55:49AM +0200, Christoph Hellwig wrote:> Use the proper modern API to transfer cache ownership for incoherent DMA. > > Signed-off-by: Christoph Hellwig <hch at lst.de> > --- > drivers/net/ethernet/seeq/sgiseeq.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/seeq/sgiseeq.c b/drivers/net/ethernet/seeq/sgiseeq.c > index 39599bbb5d45b6..f91dae16d69a19 100644 > --- a/drivers/net/ethernet/seeq/sgiseeq.c > +++ b/drivers/net/ethernet/seeq/sgiseeq.c > @@ -112,14 +112,18 @@ struct sgiseeq_private { > > static inline void dma_sync_desc_cpu(struct net_device *dev, void *addr) > { > - dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc), > - DMA_FROM_DEVICE); > + struct sgiseeq_private *sp = netdev_priv(dev); > + > + dma_sync_single_for_cpu(dev->dev.parent, VIRT_TO_DMA(sp, addr), > + sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL); > } > > static inline void dma_sync_desc_dev(struct net_device *dev, void *addr) > { > - dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc), > - DMA_TO_DEVICE); > + struct sgiseeq_private *sp = netdev_priv(dev); > + > + dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr), > + sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL); > }this breaks ethernet on IP22 completely, but I haven't figured out why, yet. Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea. [ RFC1925, 2.3 ]
Seemingly Similar Threads
- [PATCH 07/28] 53c700: improve non-coherent DMA handling
- [PATCH 07/17] 53c700: improve non-coherent DMA handling
- [PATCH 07/18] 53c700: improve non-coherent DMA handling
- [PATCH 07/17] 53c700: improve non-coherent DMA handling
- [PATCH 13/18] 53c700: convert to dma_alloc_noncoherent