thr3ads.net - Nouveau - [Nouveau] [PATCH v4 0/3] drm/nouveau: set DMA mask before mapping scratch page [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Ard Biesheuvel

2016-Sep-26 12:32 UTC

[Nouveau] [PATCH v4 0/3] drm/nouveau: set DMA mask before mapping scratch page

This v4 is now a 3 piece series, after Alexandre pointed out that both
GF 100 and NV50 are affected by the same issue, and that a related issue
has been solved already for Tegra in commit 9d0394c6bed5
("drm/nouveau/instmem/gk20a: set DMA mask early").

The issue that this series addresses is the fact that the Nouveau driver
invokes the DMA API before setting the DMA mask. In both cases addressed
here, these are simply static bidirectional mappings of scratch pages whose
purpose is not well understood, and in most cases, it does not matter that
these pages are always allocated below 4 GB even if the hardware can access
memory much higher up.

However, on platforms without any RAM below 4 GB, the preliminary DMA mask
of 32 is preventing the nouveau driver from loading on GF100 and NV50
hardware with an error like the following one:

   nouveau 0000:02:00.0: enabling device (0000 -> 0003)
   nouveau 0000:02:00.0: NVIDIA GT218 (0a8280b1)
   nouveau 0000:02:00.0: bios: version 70.18.a6.00.00
   nouveau 0000:02:00.0: fb ctor failed, -14
   nouveau: probe of 0000:02:00.0 failed with error -14

So fix this by setting a preliminary DMA mask based on the MMU device
'dma_bits'
property (patch #1), and postpone mapping the scratch pages to the respective
FB .init() hooks. (#2 and #3)

v4: split and move dma_set_mask to probe hook (Alexander)
v3: rework code to get rid of DMA_ERROR_CODE references, which is not
    defined on all architectures
v2: replace incorrect comparison of dma_addr_t type var against NULL

Ard Biesheuvel (3):
  drm/nouveau: set streaming DMA mask early
  drm/nouveau/fb/gf100: defer DMA mapping of scratch page to init() hook
  drm/nouveau/fb/nv50: defer DMA mapping of scratch page to init() hook

 drivers/gpu/drm/nouveau/nouveau_drm.c          | 11 +++++++
 drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c | 26 +++++++++++------
 drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv50.c  | 30 +++++++++++++-------
 3 files changed, 48 insertions(+), 19 deletions(-)

-- 
2.7.4

Ard Biesheuvel

2016-Sep-26 12:32 UTC

head link

[Nouveau] [PATCH v4 1/3] drm/nouveau: set streaming DMA mask early

Some subdevices (i.e., fb/nv50.c and fb/gf100.c) map a scratch page using
dma_map_page() way before the TTM layer has had a chance to set the DMA
mask. This may prevent the driver from loading at all on platforms whose
system memory is not covered by the default DMA mask of 32-bit (i.e., when
all RAM is above 4 GB).

So set a preliminary DMA mask right after constructing the PCI device, and
base it on the .dma_bits member of the MMU subdevice, which is what the TTM
layer will base the DMA mask on as well.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c
b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 652ab111dd74..e61e9a0adb51 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -361,6 +361,17 @@ static int nouveau_drm_probe(struct pci_dev *pdev,
 
 	pci_set_master(pdev);
 
+	/*
+	 * Set a preliminary DMA mask based on the .dma_bits member of the
+	 * MMU subdevice. This allows other subdevices to create DMA mappings
+	 * in their init() function, which are called before the TTM layer sets
+	 * the DMA mask definitively.
+	 * This is necessary for platforms where the default DMA mask of 32
+	 * does not cover any system memory, i.e., when all RAM is > 4 GB.
+	 */
+	dma_set_mask_and_coherent(device->dev,
+				  DMA_BIT_MASK(device->mmu->dma_bits));
+
 	ret = drm_get_pci_dev(pdev, pent, &driver_pci);
 	if (ret) {
 		nvkm_device_del(&device);
-- 
2.7.4

Ard Biesheuvel

2016-Sep-26 12:32 UTC

head link

[Nouveau] [PATCH v4 2/3] drm/nouveau/fb/gf100: defer DMA mapping of scratch page to init() hook

The 100c10 scratch page is mapped using dma_map_page() before the TTM
layer has had a chance to set the DMA mask. This means we are still
running with the default of 32 when this code executes, and this causes
problems for platforms with no memory below 4 GB (such as AMD Seattle)

So move the dma_map_page() to the .init hook, which executes after the
DMA mask has been set.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c | 26 ++++++++++++++------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c
index 76433cc66fff..5c8132873e60 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c
@@ -93,7 +93,18 @@ gf100_fb_init(struct nvkm_fb *base)
 	struct gf100_fb *fb = gf100_fb(base);
 	struct nvkm_device *device = fb->base.subdev.device;
 
-	if (fb->r100c10_page)
+	if (!fb->r100c10) {
+		dma_addr_t addr = dma_map_page(device->dev, fb->r100c10_page, 0,
+					       PAGE_SIZE, DMA_BIDIRECTIONAL);
+		if (!dma_mapping_error(device->dev, addr)) {
+			fb->r100c10 = addr;
+		} else {
+			nvkm_warn(&fb->base.subdev,
+				 "dma_map_page() failed on 100c10 page\n");
+		}
+	}
+
+	if (fb->r100c10)
 		nvkm_wr32(device, 0x100c10, fb->r100c10 >> 8);
 }
 
@@ -103,12 +114,13 @@ gf100_fb_dtor(struct nvkm_fb *base)
 	struct gf100_fb *fb = gf100_fb(base);
 	struct nvkm_device *device = fb->base.subdev.device;
 
-	if (fb->r100c10_page) {
+	if (fb->r100c10) {
 		dma_unmap_page(device->dev, fb->r100c10, PAGE_SIZE,
 			       DMA_BIDIRECTIONAL);
-		__free_page(fb->r100c10_page);
 	}
 
+	__free_page(fb->r100c10_page);
+
 	return fb;
 }
 
@@ -124,11 +136,9 @@ gf100_fb_new_(const struct nvkm_fb_func *func, struct
nvkm_device *device,
 	*pfb = &fb->base;
 
 	fb->r100c10_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (fb->r100c10_page) {
-		fb->r100c10 = dma_map_page(device->dev, fb->r100c10_page, 0,
-					   PAGE_SIZE, DMA_BIDIRECTIONAL);
-		if (dma_mapping_error(device->dev, fb->r100c10))
-			return -EFAULT;
+	if (!fb->r100c10_page) {
+		nvkm_error(&fb->base.subdev, "failed 100c10 page alloc\n");
+		return -ENOMEM;
 	}
 
 	return 0;
-- 
2.7.4

Ard Biesheuvel

2016-Sep-26 12:32 UTC

head link

[Nouveau] [PATCH v4 3/3] drm/nouveau/fb/nv50: defer DMA mapping of scratch page to init() hook

The 100c08 scratch page is mapped using dma_map_page() before the TTM
layer has had a chance to set the DMA mask. This means we are still
running with the default of 32 when this code executes, and this causes
problems for platforms with no memory below 4 GB (such as AMD Seattle)

So move the dma_map_page() to the .init hook, which executes after the
DMA mask has been set.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv50.c | 30 +++++++++++++-------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv50.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv50.c
index 1b5fb02eab2a..f029aaf01831 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv50.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/nv50.c
@@ -216,11 +216,23 @@ nv50_fb_init(struct nvkm_fb *base)
 	struct nv50_fb *fb = nv50_fb(base);
 	struct nvkm_device *device = fb->base.subdev.device;
 
+	if (!fb->r100c08) {
+		dma_addr_t addr = dma_map_page(device->dev, fb->r100c08_page, 0,
+					       PAGE_SIZE, DMA_BIDIRECTIONAL);
+		if (!dma_mapping_error(device->dev, addr)) {
+			fb->r100c08 = addr;
+		} else {
+			nvkm_warn(&fb->base.subdev,
+				  "dma_map_page() failed on 100c08 page\n");
+		}
+	}
+
 	/* Not a clue what this is exactly.  Without pointing it at a
 	 * scratch page, VRAM->GART blits with M2MF (as in DDX DFS)
 	 * cause IOMMU "read from address 0" errors (rh#561267)
 	 */
-	nvkm_wr32(device, 0x100c08, fb->r100c08 >> 8);
+	if (fb->r100c08)
+		nvkm_wr32(device, 0x100c08, fb->r100c08 >> 8);
 
 	/* This is needed to get meaningful information from 100c90
 	 * on traps. No idea what these values mean exactly. */
@@ -233,11 +245,11 @@ nv50_fb_dtor(struct nvkm_fb *base)
 	struct nv50_fb *fb = nv50_fb(base);
 	struct nvkm_device *device = fb->base.subdev.device;
 
-	if (fb->r100c08_page) {
+	if (fb->r100c08)
 		dma_unmap_page(device->dev, fb->r100c08, PAGE_SIZE,
 			       DMA_BIDIRECTIONAL);
-		__free_page(fb->r100c08_page);
-	}
+
+	__free_page(fb->r100c08_page);
 
 	return fb;
 }
@@ -264,13 +276,9 @@ nv50_fb_new_(const struct nv50_fb_func *func, struct
nvkm_device *device,
 	*pfb = &fb->base;
 
 	fb->r100c08_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (fb->r100c08_page) {
-		fb->r100c08 = dma_map_page(device->dev, fb->r100c08_page, 0,
-					   PAGE_SIZE, DMA_BIDIRECTIONAL);
-		if (dma_mapping_error(device->dev, fb->r100c08))
-			return -EFAULT;
-	} else {
-		nvkm_warn(&fb->base.subdev, "failed 100c08 page alloc\n");
+	if (!fb->r100c08_page) {
+		nvkm_error(&fb->base.subdev, "failed 100c08 page alloc\n");
+		return -ENOMEM;
 	}
 
 	return 0;
-- 
2.7.4

Alexandre Courbot

2016-Oct-03 05:39 UTC

head link

[Nouveau] [PATCH v4 1/3] drm/nouveau: set streaming DMA mask early

On Mon, Sep 26, 2016 at 9:32 PM, Ard Biesheuvel
<ard.biesheuvel at linaro.org> wrote:> Some subdevices (i.e., fb/nv50.c and fb/gf100.c) map a scratch page using
> dma_map_page() way before the TTM layer has had a chance to set the DMA
> mask. This may prevent the driver from loading at all on platforms whose
> system memory is not covered by the default DMA mask of 32-bit (i.e., when
> all RAM is above 4 GB).
>
> So set a preliminary DMA mask right after constructing the PCI device, and
> base it on the .dma_bits member of the MMU subdevice, which is what the TTM
> layer will base the DMA mask on as well.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c
b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 652ab111dd74..e61e9a0adb51 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -361,6 +361,17 @@ static int nouveau_drm_probe(struct pci_dev *pdev,
>
>         pci_set_master(pdev);
>
> +       /*
> +        * Set a preliminary DMA mask based on the .dma_bits member of the
> +        * MMU subdevice. This allows other subdevices to create DMA
mappings
> +        * in their init() function, which are called before the TTM layer
sets
> +        * the DMA mask definitively.
> +        * This is necessary for platforms where the default DMA mask of 32
> +        * does not cover any system memory, i.e., when all RAM is > 4
GB.
> +        */
> +       dma_set_mask_and_coherent(device->dev,
> +                                
DMA_BIT_MASK(device->mmu->dma_bits));
I would just move this to nvkm_device_pci_new() so that it perfectly
mirrors the same call done in nvkm_device_tegra_new(), which was done
for the same purpose. Otherwise, looks good to me.

Alexandre Courbot

2016-Oct-03 05:44 UTC

head link

[Nouveau] [PATCH v4 2/3] drm/nouveau/fb/gf100: defer DMA mapping of scratch page to init() hook

On Mon, Sep 26, 2016 at 9:32 PM, Ard Biesheuvel
<ard.biesheuvel at linaro.org> wrote:> The 100c10 scratch page is mapped using dma_map_page() before the TTM
> layer has had a chance to set the DMA mask. This means we are still
> running with the default of 32 when this code executes, and this causes
> problems for platforms with no memory below 4 GB (such as AMD Seattle)
>
> So move the dma_map_page() to the .init hook, which executes after the
> DMA mask has been set.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
> ---
>  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c | 26 ++++++++++++++------
>  1 file changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c
> index 76433cc66fff..5c8132873e60 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c
> @@ -93,7 +93,18 @@ gf100_fb_init(struct nvkm_fb *base)
>         struct gf100_fb *fb = gf100_fb(base);
>         struct nvkm_device *device = fb->base.subdev.device;
>
> -       if (fb->r100c10_page)
> +       if (!fb->r100c10) {
> +               dma_addr_t addr = dma_map_page(device->dev,
fb->r100c10_page, 0,
> +                                              PAGE_SIZE,
DMA_BIDIRECTIONAL);
> +               if (!dma_mapping_error(device->dev, addr)) {
> +                       fb->r100c10 = addr;
> +               } else {
> +                       nvkm_warn(&fb->base.subdev,
> +                                "dma_map_page() failed on 100c10
page\n");
> +               }
> +       }
> +
> +       if (fb->r100c10)
>                 nvkm_wr32(device, 0x100c10, fb->r100c10 >> 8);
gf100_fb_oneinit() seems to be a better place for this, since it will
be executed exactly once, which is what you want for a memory
allocation. As you can see other memory allocations are also performed
there, which hints it should have been done there (and not in ctor) in
the first place. Maybe you can also move the alloc_page() there so
everything is done in the same place.
>  }
>
> @@ -103,12 +114,13 @@ gf100_fb_dtor(struct nvkm_fb *base)
>         struct gf100_fb *fb = gf100_fb(base);
>         struct nvkm_device *device = fb->base.subdev.device;
>
> -       if (fb->r100c10_page) {
> +       if (fb->r100c10) {
>                 dma_unmap_page(device->dev, fb->r100c10, PAGE_SIZE,
>                                DMA_BIDIRECTIONAL);
> -               __free_page(fb->r100c10_page);
>         }
>
> +       __free_page(fb->r100c10_page);
> +
If you move the allocation/mapping to gf100_fb_oneinit() then I
suppose you don't need this change.

Alexandre Courbot

2016-Oct-03 05:45 UTC

head link

[Nouveau] [PATCH v4 3/3] drm/nouveau/fb/nv50: defer DMA mapping of scratch page to init() hook

On Mon, Sep 26, 2016 at 9:32 PM, Ard Biesheuvel
<ard.biesheuvel at linaro.org> wrote:> The 100c08 scratch page is mapped using dma_map_page() before the TTM
> layer has had a chance to set the DMA mask. This means we are still
> running with the default of 32 when this code executes, and this causes
> problems for platforms with no memory below 4 GB (such as AMD Seattle)
>
> So move the dma_map_page() to the .init hook, which executes after the
> DMA mask has been set.
The comments I did on gf100 also apply here. The only difference is
that you will have to create the oneinit hook which does not exist in
this file.

Thanks!

Possibly Parallel Threads

Search for more seemingly similar threads

Nouveau - Sep 2016 - [PATCH v4 0/3] drm/nouveau: set DMA mask before mapping scratch page

[Nouveau] [PATCH v4 0/3] drm/nouveau: set DMA mask before mapping scratch page

[Nouveau] [PATCH v4 1/3] drm/nouveau: set streaming DMA mask early

[Nouveau] [PATCH v4 2/3] drm/nouveau/fb/gf100: defer DMA mapping of scratch page to init() hook

[Nouveau] [PATCH v4 3/3] drm/nouveau/fb/nv50: defer DMA mapping of scratch page to init() hook

[Nouveau] [PATCH v4 1/3] drm/nouveau: set streaming DMA mask early

[Nouveau] [PATCH v4 2/3] drm/nouveau/fb/gf100: defer DMA mapping of scratch page to init() hook

[Nouveau] [PATCH v4 3/3] drm/nouveau/fb/nv50: defer DMA mapping of scratch page to init() hook

Possibly Parallel Threads