thr3ads.net - Nouveau - [Nouveau] [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping [Apr 2018]

If this information is useful, please help other people find it:
Share via:

Thierry Reding

2018-Apr-25 10:10 UTC

[Nouveau] [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

From: Thierry Reding <treding at nvidia.com>

Depending on the kernel configuration, early ARM architecture setup code
may have attached the GPU to a DMA/IOMMU mapping that transparently uses
the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
backed buffers (a special bit in the GPU's MMU page tables indicates the
memory path to take: via the SMMU or directly to the memory controller).
Transparently backing DMA memory with an IOMMU prevents Nouveau from
properly handling such memory accesses and causes memory access faults.

As a side-note: buffers other than those allocated in instance memory
don't need to be physically contiguous from the GPU's perspective since
the GPU can map them into contiguous buffers using its own MMU. Mapping
these buffers through the IOMMU is unnecessary and will even lead to
performance degradation because of the additional translation.

Signed-off-by: Thierry Reding <treding at nvidia.com>
---
I had already sent this out independently to fix a regression that was
introduced in v4.16, but then Christoph pointed out that it should've
been sent to a wider audience and should use a core API rather than
calling into architecture code directly.

I've added it to this series for easier reference and to show the need
for the new API.

 .../drm/nouveau/nvkm/engine/device/tegra.c    | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
index 78597da6313a..23428a7056e9 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
@@ -19,6 +19,11 @@
  * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  * DEALINGS IN THE SOFTWARE.
  */
+
+#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
+#include <asm/dma-iommu.h>
+#endif
+
 #include <core/tegra.h>
 #ifdef CONFIG_NOUVEAU_PLATFORM_DRIVER
 #include "priv.h"
@@ -105,6 +110,20 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra
*tdev)
 	unsigned long pgsize_bitmap;
 	int ret;
 
+#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
+	if (dev->archdata.mapping) {
+		struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+
+		arm_iommu_release_mapping(mapping);
+		arm_iommu_detach_device(dev);
+
+		if (dev->archdata.dma_coherent)
+			set_dma_ops(dev, &arm_coherent_dma_ops);
+		else
+			set_dma_ops(dev, &arm_dma_ops);
+	}
+#endif
+
 	if (!tdev->func->iommu_bit)
 		return;
 
-- 
2.17.0

Thierry Reding

2018-Apr-25 10:10 UTC

head link

[Nouveau] [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API

From: Thierry Reding <treding at nvidia.com>

The dma_iommu_detach_device() API can be used by drivers to forcibly
detach a device from an IOMMU that architecture code might have attached
to. This is useful for drivers that need explicit control over the IOMMU
using the IOMMU API directly.

Signed-off-by: Thierry Reding <treding at nvidia.com>
---
 drivers/base/dma-mapping.c  | 8 ++++++++
 include/linux/dma-mapping.h | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
index d82566d6e237..18ddf32b10c9 100644
--- a/drivers/base/dma-mapping.c
+++ b/drivers/base/dma-mapping.c
@@ -366,3 +366,11 @@ void dma_deconfigure(struct device *dev)
 	of_dma_deconfigure(dev);
 	acpi_dma_deconfigure(dev);
 }
+
+void dma_iommu_detach_device(struct device *dev)
+{
+#ifdef arch_iommu_detach_device
+	arch_iommu_detach_device(dev);
+#endif
+}
+EXPORT_SYMBOL(dma_iommu_detach_device);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f8ab1c0f589e..732191a2c64e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -671,6 +671,8 @@ static inline void arch_setup_dma_ops(struct device *dev,
u64 dma_base,
 static inline void arch_teardown_dma_ops(struct device *dev) { }
 #endif
 
+extern void dma_iommu_detach_device(struct device *dev);
+
 static inline unsigned int dma_get_max_seg_size(struct device *dev)
 {
 	if (dev->dma_parms && dev->dma_parms->max_segment_size)
-- 
2.17.0

Thierry Reding

2018-Apr-25 10:10 UTC

head link

[Nouveau] [PATCH v2 3/5] ARM: dma-mapping: Implement arch_iommu_detach_device()

From: Thierry Reding <treding at nvidia.com>

Implement this function to enable drivers from detaching from any IOMMU
domains that architecture code might have attached them to so that they
can take exclusive control of the IOMMU via the IOMMU API.

Signed-off-by: Thierry Reding <treding at nvidia.com>
---
Changes in v2:
- fix compilation

 arch/arm/include/asm/dma-mapping.h |  3 +++
 arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
 arch/arm/mm/dma-mapping.c          | 17 +++++++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/arch/arm/include/asm/dma-mapping.h
b/arch/arm/include/asm/dma-mapping.h
index 8436f6ade57d..d6d5bd44f962 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64
dma_base, u64 size,
 #define arch_teardown_dma_ops arch_teardown_dma_ops
 extern void arch_teardown_dma_ops(struct device *dev);
 
+#define arch_iommu_detach_device arch_iommu_detach_device
+extern void arch_iommu_detach_device(struct device *dev);
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 619f24a42d09..60fef97d8452 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -242,6 +242,10 @@ void arch_teardown_dma_ops(struct device *dev)
 {
 }
 
+void arch_iommu_detach_device(struct device *dev)
+{
+}
+
 #define PREALLOC_DMA_DEBUG_ENTRIES	4096
 
 static int __init dma_debug_do_init(void)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8c398fedbbb6..cc63a25bd088 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2423,3 +2423,20 @@ void arch_teardown_dma_ops(struct device *dev)
 
 	arm_teardown_iommu_dma_ops(dev);
 }
+
+void arch_iommu_detach_device(struct device *dev)
+{
+#ifdef CONFIG_ARM_DMA_USE_IOMMU
+	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	const struct dma_map_ops *dma_ops;
+
+	if (!mapping)
+		return;
+
+	arm_iommu_release_mapping(mapping);
+	arm_iommu_detach_device(dev);
+
+	dma_ops = arm_get_dma_map_ops(dev->archdata.dma_coherent);
+	set_dma_ops(dev, dma_ops);
+#endif
+}
-- 
2.17.0

Thierry Reding

2018-Apr-25 10:10 UTC

head link

[Nouveau] [PATCH v2 4/5] drm/nouveau: tegra: Use dma_iommu_detach_device()

From: Thierry Reding <treding at nvidia.com>

Use the new dma_iommu_detach_device() function to replace the open-coded
equivalent.

Signed-off-by: Thierry Reding <treding at nvidia.com>
---
 .../drm/nouveau/nvkm/engine/device/tegra.c    | 19 ++-----------------
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
index 23428a7056e9..c0a7f3839cbb 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
@@ -20,10 +20,6 @@
  * DEALINGS IN THE SOFTWARE.
  */
 
-#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
-#include <asm/dma-iommu.h>
-#endif
-
 #include <core/tegra.h>
 #ifdef CONFIG_NOUVEAU_PLATFORM_DRIVER
 #include "priv.h"
@@ -110,19 +106,8 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra
*tdev)
 	unsigned long pgsize_bitmap;
 	int ret;
 
-#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
-	if (dev->archdata.mapping) {
-		struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-
-		arm_iommu_release_mapping(mapping);
-		arm_iommu_detach_device(dev);
-
-		if (dev->archdata.dma_coherent)
-			set_dma_ops(dev, &arm_coherent_dma_ops);
-		else
-			set_dma_ops(dev, &arm_dma_ops);
-	}
-#endif
+	/* make sure we can use the IOMMU exclusively */
+	dma_iommu_detach_device(dev);
 
 	if (!tdev->func->iommu_bit)
 		return;
-- 
2.17.0

Thierry Reding

2018-Apr-25 10:10 UTC

head link

[Nouveau] [PATCH v2 5/5] ARM: Unconditionally enable ARM_DMA_USE_IOMMU

From: Thierry Reding <treding at nvidia.com>

The ARM_DMA_USE_IOMMU Kconfig option has side-effects that drivers can
not opt into but have to explicitly opt out of. This can lead to subtle
bugs that are difficult to track down and not immediately obvious to be
related to this Kconfig option.

To avoid this confusion, always enable the option to expose any lurking
bugs once and allow any regressions introduced by the DMA/IOMMU code to
be caught more quickly in the future.

Note that some drivers still use the Kconfig symbol to provide different
code paths depending on what architecture the code runs on (e.g. 32-bit
ARM vs. 64-bit ARM which have different and incompatible implementations
of the DMA/IOMMU integration code), so leave the symbol in place to keep
those drivers working.

For the long term, it is preferable to transition everyone to the
generic DMA/IOMMU integration code in drivers/iommu/dma-iommu.c.

Signed-off-by: Thierry Reding <treding at nvidia.com>
---
Changes in v2:
- new patch

 arch/arm/Kconfig               |  2 +-
 arch/arm/include/asm/device.h  |  6 ------
 arch/arm/mm/dma-mapping.c      | 18 ------------------
 drivers/iommu/Kconfig          |  7 -------
 drivers/media/platform/Kconfig |  1 -
 5 files changed, 1 insertion(+), 33 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index fa0b190f8a38..3c91de78535a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -124,7 +124,7 @@ config NEED_SG_DMA_LENGTH
 	bool
 
 config ARM_DMA_USE_IOMMU
-	bool
+	def_bool y
 	select ARM_HAS_SG_CHAIN
 	select NEED_SG_DMA_LENGTH
 
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 3234fe9bba6e..c3cf38e16866 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -13,9 +13,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu; /* private IOMMU data */
 #endif
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
 	struct dma_iommu_mapping	*mapping;
-#endif
 #ifdef CONFIG_XEN
 	const struct dma_map_ops *dev_dma_ops;
 #endif
@@ -31,10 +29,6 @@ struct pdev_archdata {
 #endif
 };
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
 #define to_dma_iommu_mapping(dev) ((dev)->archdata.mapping)
-#else
-#define to_dma_iommu_mapping(dev) NULL
-#endif
 
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index cc63a25bd088..f6c28ed5651a 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1174,8 +1174,6 @@ static int __init dma_debug_do_init(void)
 }
 core_initcall(dma_debug_do_init);
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-
 static int __dma_info_to_prot(enum dma_data_direction dir, unsigned long attrs)
 {
 	int prot = 0;
@@ -2366,20 +2364,6 @@ static void arm_teardown_iommu_dma_ops(struct device
*dev)
 	arm_iommu_release_mapping(mapping);
 }
 
-#else
-
-static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				    const struct iommu_ops *iommu)
-{
-	return false;
-}
-
-static void arm_teardown_iommu_dma_ops(struct device *dev) { }
-
-#define arm_get_iommu_dma_map_ops arm_get_dma_map_ops
-
-#endif	/* CONFIG_ARM_DMA_USE_IOMMU */
-
 static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 {
 	return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
@@ -2426,7 +2410,6 @@ void arch_teardown_dma_ops(struct device *dev)
 
 void arch_iommu_detach_device(struct device *dev)
 {
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
 	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
 	const struct dma_map_ops *dma_ops;
 
@@ -2438,5 +2421,4 @@ void arch_iommu_detach_device(struct device *dev)
 
 	dma_ops = arm_get_dma_map_ops(dev->archdata.dma_coherent);
 	set_dma_ops(dev, dma_ops);
-#endif
 }
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index df171cb85822..7f0b3ca76a17 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -226,7 +226,6 @@ config ROCKCHIP_IOMMU
 	depends on ARM || ARM64
 	depends on ARCH_ROCKCHIP || COMPILE_TEST
 	select IOMMU_API
-	select ARM_DMA_USE_IOMMU
 	help
 	  Support for IOMMUs found on Rockchip rk32xx SOCs.
 	  These IOMMUs allow virtualization of the address space used by most
@@ -259,7 +258,6 @@ config EXYNOS_IOMMU
 	depends on ARCH_EXYNOS && MMU
 	depends on !CPU_BIG_ENDIAN # revisit driver if we can enable big-endian ptes
 	select IOMMU_API
-	select ARM_DMA_USE_IOMMU
 	help
 	  Support for the IOMMU (System MMU) of Samsung Exynos application
 	  processor family. This enables H/W multimedia accelerators to see
@@ -283,7 +281,6 @@ config IPMMU_VMSA
 	depends on ARCH_RENESAS || (COMPILE_TEST && !GENERIC_ATOMIC64)
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
-	select ARM_DMA_USE_IOMMU
 	help
 	  Support for the Renesas VMSA-compatible IPMMU Renesas found in the
 	  R-Mobile APE6 and R-Car H2/M2 SoCs.
@@ -304,7 +301,6 @@ config ARM_SMMU
 	depends on (ARM64 || ARM) && MMU
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
-	select ARM_DMA_USE_IOMMU if ARM
 	help
 	  Support for implementations of the ARM System MMU architecture
 	  versions 1 and 2.
@@ -344,7 +340,6 @@ config MTK_IOMMU
 	bool "MTK IOMMU Support"
 	depends on ARM || ARM64
 	depends on ARCH_MEDIATEK || COMPILE_TEST
-	select ARM_DMA_USE_IOMMU
 	select IOMMU_API
 	select IOMMU_DMA
 	select IOMMU_IO_PGTABLE_ARMV7S
@@ -361,7 +356,6 @@ config MTK_IOMMU_V1
 	bool "MTK IOMMU Version 1 (M4U gen1) Support"
 	depends on ARM
 	depends on ARCH_MEDIATEK || COMPILE_TEST
-	select ARM_DMA_USE_IOMMU
 	select IOMMU_API
 	select MEMORY
 	select MTK_SMI
@@ -379,7 +373,6 @@ config QCOM_IOMMU
 	depends on HAS_DMA
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
-	select ARM_DMA_USE_IOMMU
 	help
 	  Support for IOMMU on certain Qualcomm SoCs.
 
diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index e3229f7baed1..5f7135b052ed 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -66,7 +66,6 @@ config VIDEO_OMAP3
 	depends on (ARCH_OMAP3 && OMAP_IOMMU) || COMPILE_TEST
 	depends on COMMON_CLK
 	depends on HAS_DMA && OF
-	select ARM_DMA_USE_IOMMU if OMAP_IOMMU
 	select VIDEOBUF2_DMA_CONTIG
 	select MFD_SYSCON
 	select V4L2_FWNODE
-- 
2.17.0

Russell King - ARM Linux

2018-Apr-25 10:25 UTC

head link

[Nouveau] [PATCH v2 5/5] ARM: Unconditionally enable ARM_DMA_USE_IOMMU

On Wed, Apr 25, 2018 at 12:10:51PM +0200, Thierry Reding
wrote:> From: Thierry Reding <treding at nvidia.com>
> 
> The ARM_DMA_USE_IOMMU Kconfig option has side-effects that drivers can
> not opt into but have to explicitly opt out of. This can lead to subtle
> bugs that are difficult to track down and not immediately obvious to be
> related to this Kconfig option.
> 
> To avoid this confusion, always enable the option to expose any lurking
> bugs once and allow any regressions introduced by the DMA/IOMMU code to
> be caught more quickly in the future.
> 
> Note that some drivers still use the Kconfig symbol to provide different
> code paths depending on what architecture the code runs on (e.g. 32-bit
> ARM vs. 64-bit ARM which have different and incompatible implementations
> of the DMA/IOMMU integration code), so leave the symbol in place to keep
> those drivers working.
> 
> For the long term, it is preferable to transition everyone to the
> generic DMA/IOMMU integration code in drivers/iommu/dma-iommu.c.
> 
> Signed-off-by: Thierry Reding <treding at nvidia.com>
> ---
> Changes in v2:
> - new patch
> 
>  arch/arm/Kconfig               |  2 +-
>  arch/arm/include/asm/device.h  |  6 ------
>  arch/arm/mm/dma-mapping.c      | 18 ------------------
>  drivers/iommu/Kconfig          |  7 -------
>  drivers/media/platform/Kconfig |  1 -
>  5 files changed, 1 insertion(+), 33 deletions(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index fa0b190f8a38..3c91de78535a 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -124,7 +124,7 @@ config NEED_SG_DMA_LENGTH
>  	bool
>  
>  config ARM_DMA_USE_IOMMU
> -	bool
> +	def_bool y
>  	select ARM_HAS_SG_CHAIN
>  	select NEED_SG_DMA_LENGTH
This doesn't work - as has recently been discussed with hch, we can't
globally enable NEED_SG_DMA_LENGTH on ARM - the ARM architecture
pre-dates the addition of the DMA length member in the scatterlist,
and not every machine supports the splitting of the DMA length from
the non-DMA length member.  Hence, this will cause a regression,
sorry.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Christoph Hellwig

2018-Apr-25 15:18 UTC

head link

[Nouveau] [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

The series seems to miss a cover letter.

Also I really think this patch original patch shouldn't be in the proper
series.

Christoph Hellwig

2018-Apr-25 15:19 UTC

head link

[Nouveau] [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API

On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding
wrote:> From: Thierry Reding <treding at nvidia.com>
> 
> The dma_iommu_detach_device() API can be used by drivers to forcibly
> detach a device from an IOMMU that architecture code might have attached
> to. This is useful for drivers that need explicit control over the IOMMU
> using the IOMMU API directly.
Given that no one else implements it making it a generic API seems
rather confusing.  For now I'd rename it to
arm_dma_iommu_detach_device() and only implement it in arm.

Once I've got the dma mapping implementations consolidated to a small
enough number we could think about something like a device quirk that
tells the architecture to simply never even attach the iommu dma ops
to start with.

Christoph Hellwig

2018-Apr-25 15:20 UTC

head link

[Nouveau] [PATCH v2 3/5] ARM: dma-mapping: Implement arch_iommu_detach_device()

> +void arch_iommu_detach_device(struct device *dev)
> +{
> +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> +	const struct dma_map_ops *dma_ops;
> +
> +	if (!mapping)
> +		return;
> +
> +	arm_iommu_release_mapping(mapping);
> +	arm_iommu_detach_device(dev);
> +
> +	dma_ops = arm_get_dma_map_ops(dev->archdata.dma_coherent);
> +	set_dma_ops(dev, dma_ops);
Why not simply:

	set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));

Jordan Crouse

2018-Apr-25 15:28 UTC

head link

[Nouveau] [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

On Wed, Apr 25, 2018 at 12:10:47PM +0200, Thierry Reding
wrote:> From: Thierry Reding <treding at nvidia.com>
> 
> Depending on the kernel configuration, early ARM architecture setup code
> may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> backed buffers (a special bit in the GPU's MMU page tables indicates
the
> memory path to take: via the SMMU or directly to the memory controller).
> Transparently backing DMA memory with an IOMMU prevents Nouveau from
> properly handling such memory accesses and causes memory access faults.
> 
> As a side-note: buffers other than those allocated in instance memory
> don't need to be physically contiguous from the GPU's perspective
since
> the GPU can map them into contiguous buffers using its own MMU. Mapping
> these buffers through the IOMMU is unnecessary and will even lead to
> performance degradation because of the additional translation.
> 
> Signed-off-by: Thierry Reding <treding at nvidia.com>
> ---
> I had already sent this out independently to fix a regression that was
> introduced in v4.16, but then Christoph pointed out that it should've
> been sent to a wider audience and should use a core API rather than
> calling into architecture code directly.
> 
> I've added it to this series for easier reference and to show the need
> for the new API.
This is good stuff, I am struggling with something similar on ARM64. One
problem that I wasn't able to fully solve cleanly was that for arm-smmu 
the SMMU HW resources are not released until the domain itself is destroyed
and I never quite figured out a way to swap the default domain cleanly.

This is a problem for the MSM GPU because not only do we run our own IOMMU as
you do we also have a hardware dependency to use context bank 0 to
asynchronously switch the pagetable during rendering.

I'm not sure if this is a problem you have encountered. In any event, this
code
gets us a little bit further down the path and at least there is somebody else
out there in the cold dark world that understands my pain. :)

For your interest, here was my half-hearted attempt to avoid creating DMA
domains in the first place based on a blacklist to try to spur a bit of
discussion: https://patchwork.freedesktop.org/series/41573/

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Thierry Reding

2018-Apr-26 12:09 UTC

head link

[Nouveau] [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

On Wed, Apr 25, 2018 at 08:18:15AM -0700, Christoph Hellwig
wrote:> The series seems to miss a cover letter.
> 
> Also I really think this patch original patch shouldn't be in the
proper
> series.
I added a note explaining why I included this. Not everyone in this
discussion had seen the patch and therefore may not be aware of the
problem that the series attempts to fix. I agree that it shouldn't
be merged as part of the series, though.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL:
<https://lists.freedesktop.org/archives/nouveau/attachments/20180426/a91efdbf/attachment.sig>

Thierry Reding

2018-Apr-26 12:41 UTC

head link

[Nouveau] [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

On Wed, Apr 25, 2018 at 09:28:49AM -0600, Jordan Crouse
wrote:> On Wed, Apr 25, 2018 at 12:10:47PM +0200, Thierry Reding wrote:
> > From: Thierry Reding <treding at nvidia.com>
> > 
> > Depending on the kernel configuration, early ARM architecture setup
code
> > may have attached the GPU to a DMA/IOMMU mapping that transparently
uses
> > the IOMMU to back the DMA API. Tegra requires special handling for
IOMMU
> > backed buffers (a special bit in the GPU's MMU page tables
indicates the
> > memory path to take: via the SMMU or directly to the memory
controller).
> > Transparently backing DMA memory with an IOMMU prevents Nouveau from
> > properly handling such memory accesses and causes memory access
faults.
> > 
> > As a side-note: buffers other than those allocated in instance memory
> > don't need to be physically contiguous from the GPU's
perspective since
> > the GPU can map them into contiguous buffers using its own MMU.
Mapping
> > these buffers through the IOMMU is unnecessary and will even lead to
> > performance degradation because of the additional translation.
> > 
> > Signed-off-by: Thierry Reding <treding at nvidia.com>
> > ---
> > I had already sent this out independently to fix a regression that was
> > introduced in v4.16, but then Christoph pointed out that it
should've
> > been sent to a wider audience and should use a core API rather than
> > calling into architecture code directly.
> > 
> > I've added it to this series for easier reference and to show the
need
> > for the new API.
> 
> This is good stuff, I am struggling with something similar on ARM64. One
> problem that I wasn't able to fully solve cleanly was that for arm-smmu
> the SMMU HW resources are not released until the domain itself is destroyed
> and I never quite figured out a way to swap the default domain cleanly.
> 
> This is a problem for the MSM GPU because not only do we run our own IOMMU
as
> you do we also have a hardware dependency to use context bank 0 to
> asynchronously switch the pagetable during rendering.
> 
> I'm not sure if this is a problem you have encountered.
I don't think I have. Recent chips have similar capabilities, but they
don't have the restriction to a context bank, as far as I understand.
Adding Mikko who's had more exposure to this.
> In any event, this code
> gets us a little bit further down the path and at least there is somebody
else
> out there in the cold dark world that understands my pain. :)
This doesn't actually fix anything on 64-bit ARM, and oddly enough I
haven't run into this issue myself on 64-bit ARM either. I think the
reason is that I haven't tested Nouveau on Tegra186 yet, which is the
first SoC which has an ARM SMMU. On prior 64-bit ARM chips we've used
the custom Tegra SMMU and that driver simply forbids creating any DMA
domains, so it will allow only explicit usage of the IOMMU API. There
is code in the generic DMA/IOMMU integration layer to not use the DMA
API with non-DMA IOMMU domains, but that's not true on 32-bit ARM,
unfortunately. It's entirely possible that Tegra186 will show exactly
the same problem that you are describing.

We do use the IOMMU API explicitly in the Tegra DRM driver as well,
and that is something that I've tested on Tegra186 and that I know to
be working. However, the reason why it works there is that the IOMMU
group will contain multiple display controllers, which will again
trigger a special case that will prevent the DMA/IOMMU integration
from setting up a DMA domain for use with those devices.
> For your interest, here was my half-hearted attempt to avoid creating DMA
> domains in the first place based on a blacklist to try to spur a bit of
> discussion: https://patchwork.freedesktop.org/series/41573/
This looks very interesting and simple, but I can imagine that it will
see significant pushback from the ARM SMMU maintainers (if not complete
silence), because it adds SoC-specific knowledge to an otherwise fully
generic driver. Having the GPU driver explicitly detach from the IOMMU
domain sounds like a better option, but I don't see how that would help
with the context bank issue that you're seeing.

One other possibility that I can imagine is to add something to struct
device that could be used by arch_setup_dma_ops() to not attach any of
the IOMMU-backed DMA operations to the device. Unfortunately that code
is called before the driver's ->probe() implementation is called, so a
driver doesn't have an opportunity to set it. Something like
of_dma_configure() could still set that up, perhaps based on some DT
property, though I can already hear the NAK from DT maintainers because
this is, after all, policy, not hardware description.

The last solution that I can think of that might allow us to do this is
to add a flag to struct device_driver (bool explicit_iommu?) that will
be used by the DMA/IOMMU setup code to decide whether or not to attach
to the IOMMU automatically.

Though, again, I'm not sure that would actually solve your bank problem.
That's really something I don't see any other solution than to fix it in
the ARM SMMU driver. Perhaps context bank 0 can always be reserved for
non-DMA domains?

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL:
<https://lists.freedesktop.org/archives/nouveau/attachments/20180426/86bf02b0/attachment.sig>

Possibly Parallel Threads

Search for more reasonably related threads

Nouveau - Apr 2018 - [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

[Nouveau] [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

[Nouveau] [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API

[Nouveau] [PATCH v2 3/5] ARM: dma-mapping: Implement arch_iommu_detach_device()

[Nouveau] [PATCH v2 4/5] drm/nouveau: tegra: Use dma_iommu_detach_device()

[Nouveau] [PATCH v2 5/5] ARM: Unconditionally enable ARM_DMA_USE_IOMMU

[Nouveau] [PATCH v2 5/5] ARM: Unconditionally enable ARM_DMA_USE_IOMMU

[Nouveau] [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

[Nouveau] [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API

[Nouveau] [PATCH v2 3/5] ARM: dma-mapping: Implement arch_iommu_detach_device()

[Nouveau] [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

[Nouveau] [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

[Nouveau] [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

Possibly Parallel Threads