thr3ads.net - Nouveau - [Nouveau] Fixing nouveau for >4k PAGE

If this information is useful, please help other people find it:
Share via:

Benjamin Herrenschmidt

2013-Nov-29 06:01 UTC

[Nouveau] Fixing nouveau for >4k PAGE_SIZE

On Thu, 2013-08-29 at 16:49 +1000, Ben Skeggs wrote:
> > Additionally the current code is broken in that the upper layer in
> > vm/base.c doesn't increment "pte" by the right amount.
> >
> > Now, if those two assertions can be made always true:
> >
> >  - Those two functions (map_sg and map_sg_table) never deal with the
> > "big" case.
> >
> >  - An object is always mapped at a card address that is a multiple
> > of PAGE_SIZE (ie, invividual PAGE_SIZE pages don't cross pde
boundaries
> > when mapped)
> I think these two restrictions are reasonable to enforce, and we should do
so.
> 
> >
> > Then we can probably simplify the code significantly *and* go back to
> > handling PAGE_SIZE pages in the vmm->map_sg() instead of breaking
them
> > up a level above like I do.
> Sounds good!
Ok so I experimented with that approach a bit. The code could probably
use further simplifications and cleanups but it seems to work. Note that
I haven't been able to test much more than the fbdev and the DDX without
3d accel since GLX is currently broken on nouveau big endian for other
reasons (no visuals) since the Gallium BE rework by Ajax (plus the
nv30/40 merge also introduced artifacts and instabilities on BE which I
haven't tracked down either).

The basic idea here is that the backend vmm->map_sg operates on system
PAGE_SIZE, which allows to continue passing a page array down as we do
today.

That means however that only the nv04 and nv41 backends handle that case
right now, the other ones will have to be fixed eventually but the fix
is rather easy.

Now it's possible that I've missed cases where card objects might be
allocated in vram that isn't system PAGE_SIZE aligned, in which case we
will have breakage due to having a single system PAGE crossing a PDE
boundary, you'll have to help me here figure that out though I haven't
hit any of my WARN_ON's so far :-)

Patch isn't S-O-B yet, first let me know what you think of the approach
(and maybe check I didn't break x86 :-)

diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
index ef3133e..44dc050 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
@@ -82,55 +82,77 @@ void
 nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
 			struct nouveau_mem *mem)
 {
+	/*
+	 * XXX Should the "12" in a couple of places here be replaced
+	 * with vmm->spg_shift for correctness ? Might help if we ever
+	 * support 64k card pages on 64k PAGE_SIZE systems
+	 */
 	struct nouveau_vm *vm = vma->vm;
 	struct nouveau_vmmgr *vmm = vm->vmm;
-	int big = vma->node->type != vmm->spg_shift;
 	u32 offset = vma->node->offset + (delta >> 12);
-	u32 bits = vma->node->type - 12;
-	u32 num  = length >> vma->node->type;
+	u32 shift = vma->node->type;
+	u32 order = PAGE_SHIFT - shift;
+	u32 num  = length >> PAGE_SHIFT;
 	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
-	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
-	u32 max  = 1 << (vmm->pgt_bits - bits);
-	unsigned m, sglen;
-	u32 end, len;
+	u32 pte  = offset & ((1 << vmm->pgt_bits) - 1);
+	u32 max  = 1 << vmm->pgt_bits;
+	u32 end, len, cardlen;
 	int i;
 	struct scatterlist *sg;
 
-	for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
-		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
-		sglen = sg_dma_len(sg) >> PAGE_SHIFT;
+	/* We don't handle "big" pages here */
+	if (WARN_ON(shift != vmm->spg_shift || shift > PAGE_SHIFT))
+		return;
 
-		end = pte + sglen;
-		if (unlikely(end >= max))
-			end = max;
-		len = end - pte;
+	/* We dont' handle objects that aren't PAGE_SIZE aligned either */
+	if (WARN_ON((offset << 12) & ~PAGE_MASK))
+		return;
 
-		for (m = 0; m < len; m++) {
-			dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
+	/* Iterate sglist elements */
+	for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
+		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[0];
+		unsigned long m, sglen;
+		dma_addr_t addr;
 
-			vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
-			num--;
-			pte++;
+		/* Number of system pages and base address */
+		sglen = sg_dma_len(sg) >> PAGE_SHIFT;
+		addr = sg_dma_address(sg);
+
+		/* Iterate over system pages in the segment and
+		 * covered PDEs
+		 */
+		while(sglen) {
+			/* Number of card PTEs */
+			cardlen = sglen << order;
+			end = pte + cardlen;
+			if (unlikely(end > max))
+				end = max;
+			cardlen = end - pte;
 
-			if (num == 0)
-				goto finish;
-		}
-		if (unlikely(end >= max)) {
-			pde++;
-			pte = 0;
-		}
-		if (m < sglen) {
-			for (; m < sglen; m++) {
-				dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
+			/* Convert back to system pages after cropping */
+			len = cardlen >> order;
+
+			/* Ensure this fits system pages */
+			if (WARN_ON((len << order) != cardlen))
+				break;
 
+			/* For each system page in the segment */
+			for (m = 0; m < len; m++) {
 				vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
+				addr += PAGE_SIZE;
 				num--;
-				pte++;
+				pte += (1u << order);
 				if (num == 0)
 					goto finish;
 			}
-		}
+			sglen -= len;
 
+			/* Wrap to next PDE ? */
+			if (unlikely(end >= max)) {
+				pde++;
+				pte = 0;
+			}
+		}
 	}
 finish:
 	vmm->flush(vm);
@@ -143,28 +165,44 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64
length,
 	struct nouveau_vm *vm = vma->vm;
 	struct nouveau_vmmgr *vmm = vm->vmm;
 	dma_addr_t *list = mem->pages;
-	int big = vma->node->type != vmm->spg_shift;
 	u32 offset = vma->node->offset + (delta >> 12);
-	u32 bits = vma->node->type - 12;
-	u32 num  = length >> vma->node->type;
+	u32 shift = vma->node->type;
+	u32 order = PAGE_SHIFT - shift;
+	u32 num  = length >> PAGE_SHIFT;
 	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
-	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
-	u32 max  = 1 << (vmm->pgt_bits - bits);
-	u32 end, len;
+	u32 pte  = offset & ((1 << vmm->pgt_bits) - 1);
+	u32 max  = 1 << vmm->pgt_bits;
+	u32 end, len, cardlen;
+
+	/* We don't handle "big" pages here */
+	if (WARN_ON(shift != vmm->spg_shift || shift > PAGE_SHIFT))
+		return;
+
+	/* We dont' handle objects that aren't PAGE_SIZE aligned either */
+	if (WARN_ON((offset << 12) & ~PAGE_MASK))
+		return;
 
 	while (num) {
-		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
+		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[0];
 
-		end = (pte + num);
-		if (unlikely(end >= max))
+		/* "num" is remaining system pages, check how many fit
+		 * in the PDE
+		 */
+		end = (pte + (num << order));
+		if (unlikely(end > max))
 			end = max;
-		len = end - pte;
+		cardlen = end - pte;
+		len = cardlen >> order;
+
+		/* Ensure this fits system pages */
+		if (WARN_ON((len << order) != cardlen))
+			break;
 
 		vmm->map_sg(vma, pgt, mem, pte, len, list);
 
 		num  -= len;
-		pte  += len;
 		list += len;
+		pte  += cardlen;
 		if (unlikely(end >= max)) {
 			pde++;
 			pte = 0;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
index ed45437..c1e7c11 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
@@ -38,14 +38,13 @@ nv04_vm_map_sg(struct nouveau_vma *vma, struct
nouveau_gpuobj *pgt,
 	       struct nouveau_mem *mem, u32 pte, u32 cnt, dma_addr_t *list)
 {
 	pte = 0x00008 + (pte * 4);
-	while (cnt) {
+	while (cnt--) {
 		u32 page = PAGE_SIZE / NV04_PDMA_PAGE;
 		u32 phys = (u32)*list++;
-		while (cnt && page--) {
+		while (page--) {
 			nv_wo32(pgt, pte, phys | 3);
 			phys += NV04_PDMA_PAGE;
 			pte += 4;
-			cnt -= 1;
 		}
 	}
 }
diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
index 064c762..09570d7 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
@@ -42,14 +42,13 @@ nv41_vm_map_sg(struct nouveau_vma *vma, struct
nouveau_gpuobj *pgt,
 	       struct nouveau_mem *mem, u32 pte, u32 cnt, dma_addr_t *list)
 {
 	pte = pte * 4;
-	while (cnt) {
+	while (cnt--) {
 		u32 page = PAGE_SIZE / NV41_GART_PAGE;
 		u64 phys = (u64)*list++;
-		while (cnt && page--) {
+		while (page--) {
 			nv_wo32(pgt, pte, (phys >> 7) | 1);
 			phys += NV41_GART_PAGE;
 			pte += 4;
-			cnt -= 1;
 		}
 	}
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index c0fde6b..16dce89 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -178,7 +178,7 @@ nouveau_bo_fixup_align(struct nouveau_bo *nvbo, u32 flags,
 		*size = roundup(*size, (1 << nvbo->page_shift));
 		*align = max((1 <<  nvbo->page_shift), *align);
 	}
-
+	*align = roundup(*align, PAGE_SIZE);
 	*size = roundup(*size, PAGE_SIZE);
 }
 
@@ -221,7 +221,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align,
 	nvbo->page_shift = 12;
 	if (drm->client.base.vm) {
 		if (!(flags & TTM_PL_FLAG_TT) && size > 256 * 1024)
-			nvbo->page_shift = drm->client.base.vm->vmm->lpg_shift;
+			nvbo->page_shift = lpg_shift;
 	}
 
 	nouveau_bo_fixup_align(nvbo, flags, &align, &size);
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index 0843ebc..f255ff8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -31,7 +31,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct ttm_mem_reg *mem)
 {
 	struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
 	struct nouveau_mem *node = mem->mm_node;
-	u64 size = mem->num_pages << 12;
+	u64 size = mem->num_pages << PAGE_SHIFT;
 
 	if (ttm->sg) {
 		node->sg = ttm->sg;
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 19e3757..b7fc456 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -252,8 +252,9 @@ nv04_gart_manager_new(struct ttm_mem_type_manager *man,
 
 	node->page_shift = 12;
 
-	ret = nouveau_vm_get(man->priv, mem->num_pages << 12,
node->page_shift,
-			     NV_MEM_ACCESS_RW, &node->vma[0]);
+	ret = nouveau_vm_get(man->priv, mem->num_pages << PAGE_SHIFT,
+			     node->page_shift, NV_MEM_ACCESS_RW,
+			     &node->vma[0]);
 	if (ret) {
 		kfree(node);
 		return ret;

Ben Skeggs

2013-Dec-11 03:19 UTC

head link

[Nouveau] Fixing nouveau for >4k PAGE_SIZE

On Fri, Nov 29, 2013 at 4:01 PM, Benjamin Herrenschmidt
<benh at kernel.crashing.org> wrote:> On Thu, 2013-08-29 at 16:49 +1000, Ben Skeggs wrote:
>
>> > Additionally the current code is broken in that the upper layer in
>> > vm/base.c doesn't increment "pte" by the right
amount.
>> >
>> > Now, if those two assertions can be made always true:
>> >
>> >  - Those two functions (map_sg and map_sg_table) never deal with
the
>> > "big" case.
>> >
>> >  - An object is always mapped at a card address that is a multiple
>> > of PAGE_SIZE (ie, invividual PAGE_SIZE pages don't cross pde
boundaries
>> > when mapped)
>
>> I think these two restrictions are reasonable to enforce, and we should
do so.
>>
>> >
>> > Then we can probably simplify the code significantly *and* go back
to
>> > handling PAGE_SIZE pages in the vmm->map_sg() instead of
breaking them
>> > up a level above like I do.
>
>> Sounds good!
>
> Ok so I experimented with that approach a bit. The code could probably
> use further simplifications and cleanups but it seems to work. Note that
> I haven't been able to test much more than the fbdev and the DDX
without
> 3d accel since GLX is currently broken on nouveau big endian for other
> reasons (no visuals) since the Gallium BE rework by Ajax (plus the
> nv30/40 merge also introduced artifacts and instabilities on BE which I
> haven't tracked down either).
>
> The basic idea here is that the backend vmm->map_sg operates on system
> PAGE_SIZE, which allows to continue passing a page array down as we do
> today.
>
> That means however that only the nv04 and nv41 backends handle that case
> right now, the other ones will have to be fixed eventually but the fix
> is rather easy.
>
> Now it's possible that I've missed cases where card objects might
be
> allocated in vram that isn't system PAGE_SIZE aligned, in which case we
> will have breakage due to having a single system PAGE crossing a PDE
> boundary, you'll have to help me here figure that out though I
haven't
> hit any of my WARN_ON's so far :-)
>
> Patch isn't S-O-B yet, first let me know what you think of the approach
> (and maybe check I didn't break x86 :-)Hey Ben,

I definitely think the approach is the one that makes the most sense,
for sure.  The patch so far looks fine also, minor comments inline,
but nothing exciting to add really.
>
> diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> index ef3133e..44dc050 100644
> --- a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> @@ -82,55 +82,77 @@ void
>  nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
>                         struct nouveau_mem *mem)
>  {
> +       /*
> +        * XXX Should the "12" in a couple of places here be
replaced
> +        * with vmm->spg_shift for correctness ? Might help if we ever
> +        * support 64k card pages on 64k PAGE_SIZE systems
> +        */The only "12"s that were left (yes, i missed some too obviously) were
intended to just be used to represent the smallest allocation unit for
the address space allocator, and not necessarily related to the
small/large page shift or the host page size.  However, it's probably
completely useless to have an allocation unit smaller than the small
page size anyway, so, go ahead.

I didn't review the map_sg_table hunks explicitly, assuming just
changes in similar spirit to map_sg.
>         struct nouveau_vm *vm = vma->vm;
>         struct nouveau_vmmgr *vmm = vm->vmm;
> -       int big = vma->node->type != vmm->spg_shift;
>         u32 offset = vma->node->offset + (delta >> 12);
> -       u32 bits = vma->node->type - 12;
> -       u32 num  = length >> vma->node->type;
> +       u32 shift = vma->node->type;
> +       u32 order = PAGE_SHIFT - shift;
> +       u32 num  = length >> PAGE_SHIFT;
>         u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
> -       u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1))
>> bits;
> -       u32 max  = 1 << (vmm->pgt_bits - bits);
> -       unsigned m, sglen;
> -       u32 end, len;
> +       u32 pte  = offset & ((1 << vmm->pgt_bits) - 1);
> +       u32 max  = 1 << vmm->pgt_bits;
> +       u32 end, len, cardlen;
>         int i;
>         struct scatterlist *sg;
>
> -       for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
> -               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
> -               sglen = sg_dma_len(sg) >> PAGE_SHIFT;
> +       /* We don't handle "big" pages here */
> +       if (WARN_ON(shift != vmm->spg_shift || shift > PAGE_SHIFT))
> +               return;
>
> -               end = pte + sglen;
> -               if (unlikely(end >= max))
> -                       end = max;
> -               len = end - pte;
> +       /* We dont' handle objects that aren't PAGE_SIZE aligned
either */
> +       if (WARN_ON((offset << 12) & ~PAGE_MASK))
> +               return;
>
> -               for (m = 0; m < len; m++) {
> -                       dma_addr_t addr = sg_dma_address(sg) + (m <<
PAGE_SHIFT);
> +       /* Iterate sglist elements */
> +       for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
> +               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[0];
> +               unsigned long m, sglen;
> +               dma_addr_t addr;
>
> -                       vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
> -                       num--;
> -                       pte++;
> +               /* Number of system pages and base address */
> +               sglen = sg_dma_len(sg) >> PAGE_SHIFT;
> +               addr = sg_dma_address(sg);
> +
> +               /* Iterate over system pages in the segment and
> +                * covered PDEs
> +                */
> +               while(sglen) {
> +                       /* Number of card PTEs */
> +                       cardlen = sglen << order;
> +                       end = pte + cardlen;
> +                       if (unlikely(end > max))
> +                               end = max;
> +                       cardlen = end - pte;
>
> -                       if (num == 0)
> -                               goto finish;
> -               }
> -               if (unlikely(end >= max)) {
> -                       pde++;
> -                       pte = 0;
> -               }
> -               if (m < sglen) {
> -                       for (; m < sglen; m++) {
> -                               dma_addr_t addr = sg_dma_address(sg) + (m
<< PAGE_SHIFT);
> +                       /* Convert back to system pages after cropping */
> +                       len = cardlen >> order;
> +
> +                       /* Ensure this fits system pages */
> +                       if (WARN_ON((len << order) != cardlen))
> +                               break;
>
> +                       /* For each system page in the segment */
> +                       for (m = 0; m < len; m++) {
>                                 vmm->map_sg(vma, pgt, mem, pte, 1,
&addr);
> +                               addr += PAGE_SIZE;
>                                 num--;
> -                               pte++;
> +                               pte += (1u << order);
>                                 if (num == 0)
>                                         goto finish;
>                         }
> -               }
> +                       sglen -= len;
>
> +                       /* Wrap to next PDE ? */
> +                       if (unlikely(end >= max)) {
> +                               pde++;
> +                               pte = 0;
> +                       }
> +               }
>         }
>  finish:
>         vmm->flush(vm);
> @@ -143,28 +165,44 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta,
u64 length,
>         struct nouveau_vm *vm = vma->vm;
>         struct nouveau_vmmgr *vmm = vm->vmm;
>         dma_addr_t *list = mem->pages;
> -       int big = vma->node->type != vmm->spg_shift;
>         u32 offset = vma->node->offset + (delta >> 12);
> -       u32 bits = vma->node->type - 12;
> -       u32 num  = length >> vma->node->type;
> +       u32 shift = vma->node->type;
> +       u32 order = PAGE_SHIFT - shift;
> +       u32 num  = length >> PAGE_SHIFT;
>         u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
> -       u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1))
>> bits;
> -       u32 max  = 1 << (vmm->pgt_bits - bits);
> -       u32 end, len;
> +       u32 pte  = offset & ((1 << vmm->pgt_bits) - 1);
> +       u32 max  = 1 << vmm->pgt_bits;
> +       u32 end, len, cardlen;
> +
> +       /* We don't handle "big" pages here */
> +       if (WARN_ON(shift != vmm->spg_shift || shift > PAGE_SHIFT))
> +               return;
> +
> +       /* We dont' handle objects that aren't PAGE_SIZE aligned
either */
> +       if (WARN_ON((offset << 12) & ~PAGE_MASK))
> +               return;
>
>         while (num) {
> -               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
> +               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[0];
>
> -               end = (pte + num);
> -               if (unlikely(end >= max))
> +               /* "num" is remaining system pages, check how
many fit
> +                * in the PDE
> +                */
> +               end = (pte + (num << order));
> +               if (unlikely(end > max))
>                         end = max;
> -               len = end - pte;
> +               cardlen = end - pte;
> +               len = cardlen >> order;
> +
> +               /* Ensure this fits system pages */
> +               if (WARN_ON((len << order) != cardlen))
> +                       break;
>
>                 vmm->map_sg(vma, pgt, mem, pte, len, list);
>
>                 num  -= len;
> -               pte  += len;
>                 list += len;
> +               pte  += cardlen;I think this all looks OK too.
>                 if (unlikely(end >= max)) {
>                         pde++;
>                         pte = 0;
> diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> index ed45437..c1e7c11 100644
> --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> @@ -38,14 +38,13 @@ nv04_vm_map_sg(struct nouveau_vma *vma, struct
nouveau_gpuobj *pgt,
>                struct nouveau_mem *mem, u32 pte, u32 cnt, dma_addr_t *list)
>  {
>         pte = 0x00008 + (pte * 4);
> -       while (cnt) {
> +       while (cnt--) {
>                 u32 page = PAGE_SIZE / NV04_PDMA_PAGE;
>                 u32 phys = (u32)*list++;
> -               while (cnt && page--) {
> +               while (page--) {
>                         nv_wo32(pgt, pte, phys | 3);
>                         phys += NV04_PDMA_PAGE;
>                         pte += 4;
> -                       cnt -= 1;
>                 }
>         }
>  }Ack
> diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> index 064c762..09570d7 100644
> --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> @@ -42,14 +42,13 @@ nv41_vm_map_sg(struct nouveau_vma *vma, struct
nouveau_gpuobj *pgt,
>                struct nouveau_mem *mem, u32 pte, u32 cnt, dma_addr_t *list)
>  {
>         pte = pte * 4;
> -       while (cnt) {
> +       while (cnt--) {
>                 u32 page = PAGE_SIZE / NV41_GART_PAGE;
>                 u64 phys = (u64)*list++;
> -               while (cnt && page--) {
> +               while (page--) {
>                         nv_wo32(pgt, pte, (phys >> 7) | 1);
>                         phys += NV41_GART_PAGE;
>                         pte += 4;
> -                       cnt -= 1;
>                 }
>         }
>  }Ack
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c
b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index c0fde6b..16dce89 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -178,7 +178,7 @@ nouveau_bo_fixup_align(struct nouveau_bo *nvbo, u32
flags,
>                 *size = roundup(*size, (1 << nvbo->page_shift));
>                 *align = max((1 <<  nvbo->page_shift), *align);
>         }
> -
> +       *align = roundup(*align, PAGE_SIZE);
>         *size = roundup(*size, PAGE_SIZE);
>  }
>
> @@ -221,7 +221,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int
align,
>         nvbo->page_shift = 12;
>         if (drm->client.base.vm) {
>                 if (!(flags & TTM_PL_FLAG_TT) && size > 256
* 1024)
> -                       nvbo->page_shift =
drm->client.base.vm->vmm->lpg_shift;
> +                       nvbo->page_shift = lpg_shift;
>         }Ack both hunks.
>
>         nouveau_bo_fixup_align(nvbo, flags, &align, &size);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> index 0843ebc..f255ff8 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> @@ -31,7 +31,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct ttm_mem_reg
*mem)
>  {
>         struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
>         struct nouveau_mem *node = mem->mm_node;
> -       u64 size = mem->num_pages << 12;
> +       u64 size = mem->num_pages << PAGE_SHIFT;Ack.  However, a patch doing this already exists in the Nouveau kernel
tree
(http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=19ab15062b8fde70ff04784bd49abc330e92310d).

>
>         if (ttm->sg) {
>                 node->sg = ttm->sg;
> diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> index 19e3757..b7fc456 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> @@ -252,8 +252,9 @@ nv04_gart_manager_new(struct ttm_mem_type_manager *man,
>
>         node->page_shift = 12;
>
> -       ret = nouveau_vm_get(man->priv, mem->num_pages << 12,
node->page_shift,
> -                            NV_MEM_ACCESS_RW, &node->vma[0]);
> +       ret = nouveau_vm_get(man->priv, mem->num_pages <<
PAGE_SHIFT,
> +                            node->page_shift, NV_MEM_ACCESS_RW,
> +                            &node->vma[0]);Ack.
>         if (ret) {
>                 kfree(node);
>                 return ret;
>
>

Patrick Baggett

2013-Dec-11 14:44 UTC

head link

[Nouveau] Fixing nouveau for >4k PAGE_SIZE

I'm looking forward to something that fixes page size bugs. I still have
the SPARC box that I reported but #58984 [1] on and I'm ready to test
changes. Is there an easy way to apply these patches to nouveau master and
try it out? SPARC has (default) 8KB page size, though, not 64KB.



[1] https://bugs.freedesktop.org/show_bug.cgi?id=58984


On Tue, Dec 10, 2013 at 9:19 PM, Ben Skeggs <skeggsb at gmail.com> wrote:
> On Fri, Nov 29, 2013 at 4:01 PM, Benjamin Herrenschmidt
> <benh at kernel.crashing.org> wrote:
> > On Thu, 2013-08-29 at 16:49 +1000, Ben Skeggs wrote:
> >
> >> > Additionally the current code is broken in that the upper
layer in
> >> > vm/base.c doesn't increment "pte" by the right
amount.
> >> >
> >> > Now, if those two assertions can be made always true:
> >> >
> >> >  - Those two functions (map_sg and map_sg_table) never deal
with the
> >> > "big" case.
> >> >
> >> >  - An object is always mapped at a card address that is a
multiple
> >> > of PAGE_SIZE (ie, invividual PAGE_SIZE pages don't cross
pde
> boundaries
> >> > when mapped)
> >
> >> I think these two restrictions are reasonable to enforce, and we
should
> do so.
> >>
> >> >
> >> > Then we can probably simplify the code significantly *and* go
back to
> >> > handling PAGE_SIZE pages in the vmm->map_sg() instead of
breaking them
> >> > up a level above like I do.
> >
> >> Sounds good!
> >
> > Ok so I experimented with that approach a bit. The code could probably
> > use further simplifications and cleanups but it seems to work. Note
that
> > I haven't been able to test much more than the fbdev and the DDX
without
> > 3d accel since GLX is currently broken on nouveau big endian for other
> > reasons (no visuals) since the Gallium BE rework by Ajax (plus the
> > nv30/40 merge also introduced artifacts and instabilities on BE which
I
> > haven't tracked down either).
> >
> > The basic idea here is that the backend vmm->map_sg operates on
system
> > PAGE_SIZE, which allows to continue passing a page array down as we do
> > today.
> >
> > That means however that only the nv04 and nv41 backends handle that
case
> > right now, the other ones will have to be fixed eventually but the fix
> > is rather easy.
> >
> > Now it's possible that I've missed cases where card objects
might be
> > allocated in vram that isn't system PAGE_SIZE aligned, in which
case we
> > will have breakage due to having a single system PAGE crossing a PDE
> > boundary, you'll have to help me here figure that out though I
haven't
> > hit any of my WARN_ON's so far :-)
> >
> > Patch isn't S-O-B yet, first let me know what you think of the
approach
> > (and maybe check I didn't break x86 :-)
> Hey Ben,
>
> I definitely think the approach is the one that makes the most sense,
> for sure.  The patch so far looks fine also, minor comments inline,
> but nothing exciting to add really.
>
> >
> > diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> > index ef3133e..44dc050 100644
> > --- a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> > +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> > @@ -82,55 +82,77 @@ void
> >  nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64
length,
> >                         struct nouveau_mem *mem)
> >  {
> > +       /*
> > +        * XXX Should the "12" in a couple of places here be
replaced
> > +        * with vmm->spg_shift for correctness ? Might help if we
ever
> > +        * support 64k card pages on 64k PAGE_SIZE systems
> > +        */
> The only "12"s that were left (yes, i missed some too obviously)
were
> intended to just be used to represent the smallest allocation unit for
> the address space allocator, and not necessarily related to the
> small/large page shift or the host page size.  However, it's probably
> completely useless to have an allocation unit smaller than the small
> page size anyway, so, go ahead.
>
> I didn't review the map_sg_table hunks explicitly, assuming just
> changes in similar spirit to map_sg.
>
> >         struct nouveau_vm *vm = vma->vm;
> >         struct nouveau_vmmgr *vmm = vm->vmm;
> > -       int big = vma->node->type != vmm->spg_shift;
> >         u32 offset = vma->node->offset + (delta >> 12);
> > -       u32 bits = vma->node->type - 12;
> > -       u32 num  = length >> vma->node->type;
> > +       u32 shift = vma->node->type;
> > +       u32 order = PAGE_SHIFT - shift;
> > +       u32 num  = length >> PAGE_SHIFT;
> >         u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
> > -       u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1))
>> bits;
> > -       u32 max  = 1 << (vmm->pgt_bits - bits);
> > -       unsigned m, sglen;
> > -       u32 end, len;
> > +       u32 pte  = offset & ((1 << vmm->pgt_bits) - 1);
> > +       u32 max  = 1 << vmm->pgt_bits;
> > +       u32 end, len, cardlen;
> >         int i;
> >         struct scatterlist *sg;
> >
> > -       for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
> > -               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
> > -               sglen = sg_dma_len(sg) >> PAGE_SHIFT;
> > +       /* We don't handle "big" pages here */
> > +       if (WARN_ON(shift != vmm->spg_shift || shift >
PAGE_SHIFT))
> > +               return;
> >
> > -               end = pte + sglen;
> > -               if (unlikely(end >= max))
> > -                       end = max;
> > -               len = end - pte;
> > +       /* We dont' handle objects that aren't PAGE_SIZE
aligned either
> */
> > +       if (WARN_ON((offset << 12) & ~PAGE_MASK))
> > +               return;
> >
> > -               for (m = 0; m < len; m++) {
> > -                       dma_addr_t addr = sg_dma_address(sg) + (m
<<
> PAGE_SHIFT);
> > +       /* Iterate sglist elements */
> > +       for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
> > +               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[0];
> > +               unsigned long m, sglen;
> > +               dma_addr_t addr;
> >
> > -                       vmm->map_sg(vma, pgt, mem, pte, 1,
&addr);
> > -                       num--;
> > -                       pte++;
> > +               /* Number of system pages and base address */
> > +               sglen = sg_dma_len(sg) >> PAGE_SHIFT;
> > +               addr = sg_dma_address(sg);
> > +
> > +               /* Iterate over system pages in the segment and
> > +                * covered PDEs
> > +                */
> > +               while(sglen) {
> > +                       /* Number of card PTEs */
> > +                       cardlen = sglen << order;
> > +                       end = pte + cardlen;
> > +                       if (unlikely(end > max))
> > +                               end = max;
> > +                       cardlen = end - pte;
> >
> > -                       if (num == 0)
> > -                               goto finish;
> > -               }
> > -               if (unlikely(end >= max)) {
> > -                       pde++;
> > -                       pte = 0;
> > -               }
> > -               if (m < sglen) {
> > -                       for (; m < sglen; m++) {
> > -                               dma_addr_t addr = sg_dma_address(sg) +
> (m << PAGE_SHIFT);
> > +                       /* Convert back to system pages after cropping
*/
> > +                       len = cardlen >> order;
> > +
> > +                       /* Ensure this fits system pages */
> > +                       if (WARN_ON((len << order) != cardlen))
> > +                               break;
> >
> > +                       /* For each system page in the segment */
> > +                       for (m = 0; m < len; m++) {
> >                                 vmm->map_sg(vma, pgt, mem, pte, 1,
> &addr);
> > +                               addr += PAGE_SIZE;
> >                                 num--;
> > -                               pte++;
> > +                               pte += (1u << order);
> >                                 if (num == 0)
> >                                         goto finish;
> >                         }
> > -               }
> > +                       sglen -= len;
> >
> > +                       /* Wrap to next PDE ? */
> > +                       if (unlikely(end >= max)) {
> > +                               pde++;
> > +                               pte = 0;
> > +                       }
> > +               }
> >         }
> >  finish:
> >         vmm->flush(vm);
> > @@ -143,28 +165,44 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64
> delta, u64 length,
> >         struct nouveau_vm *vm = vma->vm;
> >         struct nouveau_vmmgr *vmm = vm->vmm;
> >         dma_addr_t *list = mem->pages;
> > -       int big = vma->node->type != vmm->spg_shift;
> >         u32 offset = vma->node->offset + (delta >> 12);
> > -       u32 bits = vma->node->type - 12;
> > -       u32 num  = length >> vma->node->type;
> > +       u32 shift = vma->node->type;
> > +       u32 order = PAGE_SHIFT - shift;
> > +       u32 num  = length >> PAGE_SHIFT;
> >         u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
> > -       u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1))
>> bits;
> > -       u32 max  = 1 << (vmm->pgt_bits - bits);
> > -       u32 end, len;
> > +       u32 pte  = offset & ((1 << vmm->pgt_bits) - 1);
> > +       u32 max  = 1 << vmm->pgt_bits;
> > +       u32 end, len, cardlen;
> > +
> > +       /* We don't handle "big" pages here */
> > +       if (WARN_ON(shift != vmm->spg_shift || shift >
PAGE_SHIFT))
> > +               return;
> > +
> > +       /* We dont' handle objects that aren't PAGE_SIZE
aligned either
> */
> > +       if (WARN_ON((offset << 12) & ~PAGE_MASK))
> > +               return;
> >
> >         while (num) {
> > -               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
> > +               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[0];
> >
> > -               end = (pte + num);
> > -               if (unlikely(end >= max))
> > +               /* "num" is remaining system pages, check
how many fit
> > +                * in the PDE
> > +                */
> > +               end = (pte + (num << order));
> > +               if (unlikely(end > max))
> >                         end = max;
> > -               len = end - pte;
> > +               cardlen = end - pte;
> > +               len = cardlen >> order;
> > +
> > +               /* Ensure this fits system pages */
> > +               if (WARN_ON((len << order) != cardlen))
> > +                       break;
> >
> >                 vmm->map_sg(vma, pgt, mem, pte, len, list);
> >
> >                 num  -= len;
> > -               pte  += len;
> >                 list += len;
> > +               pte  += cardlen;
> I think this all looks OK too.
>
> >                 if (unlikely(end >= max)) {
> >                         pde++;
> >                         pte = 0;
> > diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> > index ed45437..c1e7c11 100644
> > --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> > +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> > @@ -38,14 +38,13 @@ nv04_vm_map_sg(struct nouveau_vma *vma, struct
> nouveau_gpuobj *pgt,
> >                struct nouveau_mem *mem, u32 pte, u32 cnt, dma_addr_t
> *list)
> >  {
> >         pte = 0x00008 + (pte * 4);
> > -       while (cnt) {
> > +       while (cnt--) {
> >                 u32 page = PAGE_SIZE / NV04_PDMA_PAGE;
> >                 u32 phys = (u32)*list++;
> > -               while (cnt && page--) {
> > +               while (page--) {
> >                         nv_wo32(pgt, pte, phys | 3);
> >                         phys += NV04_PDMA_PAGE;
> >                         pte += 4;
> > -                       cnt -= 1;
> >                 }
> >         }
> >  }
> Ack
>
> > diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> > index 064c762..09570d7 100644
> > --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> > +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> > @@ -42,14 +42,13 @@ nv41_vm_map_sg(struct nouveau_vma *vma, struct
> nouveau_gpuobj *pgt,
> >                struct nouveau_mem *mem, u32 pte, u32 cnt, dma_addr_t
> *list)
> >  {
> >         pte = pte * 4;
> > -       while (cnt) {
> > +       while (cnt--) {
> >                 u32 page = PAGE_SIZE / NV41_GART_PAGE;
> >                 u64 phys = (u64)*list++;
> > -               while (cnt && page--) {
> > +               while (page--) {
> >                         nv_wo32(pgt, pte, (phys >> 7) | 1);
> >                         phys += NV41_GART_PAGE;
> >                         pte += 4;
> > -                       cnt -= 1;
> >                 }
> >         }
> >  }
> Ack
>
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c
> b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > index c0fde6b..16dce89 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > @@ -178,7 +178,7 @@ nouveau_bo_fixup_align(struct nouveau_bo *nvbo,
u32
> flags,
> >                 *size = roundup(*size, (1 <<
nvbo->page_shift));
> >                 *align = max((1 <<  nvbo->page_shift),
*align);
> >         }
> > -
> > +       *align = roundup(*align, PAGE_SIZE);
> >         *size = roundup(*size, PAGE_SIZE);
> >  }
> >
> > @@ -221,7 +221,7 @@ nouveau_bo_new(struct drm_device *dev, int size,
int
> align,
> >         nvbo->page_shift = 12;
> >         if (drm->client.base.vm) {
> >                 if (!(flags & TTM_PL_FLAG_TT) && size >
256 * 1024)
> > -                       nvbo->page_shift >
drm->client.base.vm->vmm->lpg_shift;
> > +                       nvbo->page_shift = lpg_shift;
> >         }
> Ack both hunks.
>
> >
> >         nouveau_bo_fixup_align(nvbo, flags, &align, &size);
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> > index 0843ebc..f255ff8 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> > @@ -31,7 +31,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct
ttm_mem_reg
> *mem)
> >  {
> >         struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be
*)ttm;
> >         struct nouveau_mem *node = mem->mm_node;
> > -       u64 size = mem->num_pages << 12;
> > +       u64 size = mem->num_pages << PAGE_SHIFT;
> Ack.  However, a patch doing this already exists in the Nouveau kernel
> tree (
>
http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=19ab15062b8fde70ff04784bd49abc330e92310d
> ).
>
>
> >
> >         if (ttm->sg) {
> >                 node->sg = ttm->sg;
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c
> b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> > index 19e3757..b7fc456 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> > @@ -252,8 +252,9 @@ nv04_gart_manager_new(struct ttm_mem_type_manager
> *man,
> >
> >         node->page_shift = 12;
> >
> > -       ret = nouveau_vm_get(man->priv, mem->num_pages <<
12,
> node->page_shift,
> > -                            NV_MEM_ACCESS_RW, &node->vma[0]);
> > +       ret = nouveau_vm_get(man->priv, mem->num_pages <<
PAGE_SHIFT,
> > +                            node->page_shift, NV_MEM_ACCESS_RW,
> > +                            &node->vma[0]);
> Ack.
>
> >         if (ret) {
> >                 kfree(node);
> >                 return ret;
> >
> >
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.freedesktop.org/archives/nouveau/attachments/20131211/a9ca2d87/attachment-0001.html>

Reasonably Related Threads

Search for more apparently analagous threads

Nouveau - Dec 2013 - Fixing nouveau for >4k PAGE_SIZE

[Nouveau] Fixing nouveau for >4k PAGE_SIZE

[Nouveau] Fixing nouveau for >4k PAGE_SIZE

[Nouveau] Fixing nouveau for >4k PAGE_SIZE

Reasonably Related Threads