thr3ads.net - Nouveau - [Nouveau] Fixing nouveau for >4k PAGE

If this information is useful, please help other people find it:
Share via:

Benjamin Herrenschmidt

2013-Aug-11 00:41 UTC

[Nouveau] Fixing nouveau for >4k PAGE_SIZE

Hi folks !

So I've been trying to figure out what it would take to make
nouveau work properly on architectures where PAGE_SIZE isn't
4k such as most ppc64's. An initial patch from Dave fixed a
bogon in nv41.c nv41_vm_map_sg() which was trying to handle
the case at that low level, but this isn't enough, and after
a bit of digging, I also think that's not the right approach:

So, from what I can tell, subdev/vm/base.c is not clean vs. PAGE_SIZE
in a number of places unless I'm mistaken. For example, in
nouveau_vm_map_sg_table(), something like that:

		sglen = sg_dma_len(sg) >> PAGE_SHIFT;

		end = pte + sglen;

Seems to imply an assumption here that the "pte" is in multiple of
PAGE_SHIFT, but afaik, it's not. So further down, we do:

		for (m = 0; m < len; m++) {
			dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);

			vmm->map_sg(vma, pgt, mem, pte, 1, &addr);                       
			num--;
			pte++;

But in fact, inside vmm->map_sg, with the current code, we will have
incremented pte by more than 1 ... so we basically lose track here.

			if (num == 0)
				goto finish;
		}
		if (unlikely(end >= max)) {
			pde++;
			pte = 0;
		}

We need to similarly make sure we don't end up crossing accross two
pde's here, ie, that our 64k page isn't mapped at a non-multiple of
64k in the card space, where is that enforced ? (It might be, I don't
know). If it is, then we need to recalc the pde on each sub page.

So I'm thinking the right fix is to remove the inner loop in nv41.c
nv41_vm_map_sg() and similars, let those basically deal with card-size
PTEs exclusively, and move the breakdown logic inside
nouveau_vm_map_sg_table() and friendes, that's the only way it can get
pte and pde right anyway and it would generally work with all backends
(well, hopefully)

Now, to do that, I need a better understanding of the various things
in there since I'm not familiar with nouveau at all. What I think I've
figured out is with a few questions, it would be awesome if you could
answer them so I can have a shot at fixing it all :-)

 - There is spg_shift and lpg_shift in the backend vmm. My understanding
is those correspond to the supported small and large page shift respectively
in the card's MMU, correct ? On nv41 they are both 12.

 - vma->node->type indicates the desired page shift for a given vma
object we are trying to map. It may or may not match spg_shift. If it
doesn't, the 'big' flag gets set in the various vm/base.c functions,
which makes them use a different page table via:

		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];

 Question: Can that ever happen on nv41 ? Or rather, can node->type
ever be set to something that is neither vmm->spg_shift nor vmm->lpg_shift
?

 - vma->node->offset is the location in bytes in the card memory space
of the nouveau_vma object, right ?

 - In nouveau_vm_map_sg_table(), we take a "delta" argument. I assume
that's
an indication of where within the "vma" we want to map the sg list, ie
page
1 of the sg list gets mapped to page 1 of the VMA, correct ?

Thanks !

Ben.

Benjamin Herrenschmidt

2013-Aug-11 05:36 UTC

head link

[Nouveau] Fixing nouveau for >4k PAGE_SIZE

On Sun, 2013-08-11 at 10:41 +1000, Benjamin Herrenschmidt
wrote:> Now, to do that, I need a better understanding of the various things
> in there since I'm not familiar with nouveau at all. What I think
I've
> figured out is with a few questions, it would be awesome if you could
> answer them so I can have a shot at fixing it all :-)
Ok, a few more questions :-)

 - in struct nouveau_mm_node, what unit are "offset" and
"length" ?

They *seem* to be hard wired to be in units of 4k (regardless of either
of system page size) which I assume means card small page size, but
"offset" is in bytes ? At least according to my parsing of the code in
nouveau_vm_map_at().

The big question then is whether that represents card address space or
system address space... I assume the former right ? So a function like
nouveau_vm_map_at() is solely concerned with mapping a piece of card
address space in the card VM and thus shouldn't be concerned by the
system PAGE_SIZE at all, right ?

IE. The only one we should actually care about here is
nouveau_vm_map_sg_table() or am I missing an important piece of the
puzzle ?

Additionally, nv41.c has only map_sg() callbacks, no map() callbacks,
should I assume that means that nouveau_vm_map() and nouveau_vm_map_at()
will never be called on these ?

 - In vm/base.c this construct appears regulary:

    struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];

Which makes me believe we have separate page tables for small vs. large
pages (in card mmu) (though I assume big is always 0 on nv40 unless
missed something, I want to make sure I'm not breaking everything
else...).

Thus I assume that each "pte" in a "big" page table maps a
page size
of 1 << vmm->lpg_shift, is that correct ?

vmm->pgt_bits is always the same however, thus I assume that PDEs always
map the same amount of space, but regions for large pages just have
fewer PTEs, which seem to correspond to what the code does here:

	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;

- My basic idea is to move the card page vs system PAGE breakdown
up into vm/base.c, which seems reasonably simple in the case of
nouveau_vm_map_sg_table() since we only call vmm->map_sg for one PTE at
a time, but things get a bit trickier with nouveau_vm_map_sg which
passes the whole page list down.

Following my idea would mean essentially also making it operate one PTE
at a time, thus potentially reducing the performance of the map
operations.

One option is to special case the PAGE_SIZE==12 case to keep the
existing behaviour and operate in "reduced" (one PTE per call)
mode in the other case but I dislike special casing like that (bitrots,
one code path gets untested/unfixed, etc...)

Another option would be to make struct nouveau_mem *mem ->pages always
be an array of 4k (ie, create a breakdown when constructing that list),
but I have yet to figure out what the impact would be (where that stuff
gets created) and whether this is realistic at all...

 - struct nouveau_mem is called "node" in various places (in
nouveau_ttm) which is confusing. What is the relationship between
nouveau_mem, nouveau_mm and nouveau_mm_node ? nouveau_mem seems to be
having things such as "offset" in multiple of PAGE_SIZE, but have a
"page_shift" member that appears to match the buffer object
page_shift,
hence seem to indicate that it is a card page shift...

Would it be possible, maybe, to add comments next to the fields in
those various data structure indicating in which units they are ?

 - Similar confusion arises with things like struct ttm_mem_reg *mem.
For example, in nouveau_ttm.c, I see statements like:

	ret = nouveau_vm_get(man->priv, mem->num_pages << 12,
node->page_shift,
			     NV_MEM_ACCESS_RW, &node->vma[0]);

Which seems to indicate that mem->num_pages is in multiple of 4k always,
though I would have though that a ttm object was in multiple of
PAGE_SIZE, am I wrong ?

Especially since the same object is later populated using:

	mem->start   = node->vma[0].offset >> PAGE_SHIFT;

So the "start" member is in units of PAGE_SHIFT here, not 12.

So I'm still a bit confused :-)

Cheers,
Ben.

Maarten Lankhorst

2013-Aug-11 06:17 UTC

head link

[Nouveau] Fixing nouveau for >4k PAGE_SIZE

Op 11-08-13 07:36, Benjamin Herrenschmidt schreef:> On Sun, 2013-08-11 at 10:41 +1000, Benjamin Herrenschmidt wrote:
>> Now, to do that, I need a better understanding of the various things
>> in there since I'm not familiar with nouveau at all. What I think
I've
>> figured out is with a few questions, it would be awesome if you could
>> answer them so I can have a shot at fixing it all :-)
> Ok, a few more questions :-)
>
>  - in struct nouveau_mm_node, what unit are "offset" and
"length" ?Depends on the context. It's re-used a few times. In case of nouveau_vm
it's multiples of 1<<12, which is the smallest unit
a card can address.> They *seem* to be hard wired to be in units of 4k (regardless of either
> of system page size) which I assume means card small page size, but
> "offset" is in bytes ? At least according to my parsing of the
code in
> nouveau_vm_map_at().
Correct.> The big question then is whether that represents card address space or
> system address space... I assume the former right ? So a function like
> nouveau_vm_map_at() is solely concerned with mapping a piece of card
> address space in the card VM and thus shouldn't be concerned by the
> system PAGE_SIZE at all, right ?Former, but the code entangles system PAGE_SIZE and card PAGE_SIZE/SHIFT/MASK in
some cases.
> IE. The only one we should actually care about here is
> nouveau_vm_map_sg_table() or am I missing an important piece of the
> puzzle ?nouveau_vm_map_sg too.

nouveau_vm_map is special, and also used to map VRAM into BAR1/BAR3 by
subdev/bar code.> Additionally, nv41.c has only map_sg() callbacks, no map() callbacks,
> should I assume that means that nouveau_vm_map() and nouveau_vm_map_at()
> will never be called on these ?Correct. all cards before the nv50 family have no real vm. the BAR used for vram
is just an identity mapping,
not the entirety of VRAM may be accessible to the
system.>  - In vm/base.c this construct appears regulary:
>
>     struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
>
> Which makes me believe we have separate page tables for small vs. large
> pages (in card mmu) (though I assume big is always 0 on nv40 unless
> missed something, I want to make sure I'm not breaking everything
> else...).
>
> Thus I assume that each "pte" in a "big" page table
maps a page size
> of 1 << vmm->lpg_shift, is that correct ?Correct, nv50+ are the only ones that support large
pages.> vmm->pgt_bits is always the same however, thus I assume that PDEs always
> map the same amount of space, but regions for large pages just have
> fewer PTEs, which seem to correspond to what the code does here:
>
> 	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >>
bits;
>
> - My basic idea is to move the card page vs system PAGE breakdown
> up into vm/base.c, which seems reasonably simple in the case of
> nouveau_vm_map_sg_table() since we only call vmm->map_sg for one PTE at
> a time, but things get a bit trickier with nouveau_vm_map_sg which
> passes the whole page list down.
>
> Following my idea would mean essentially also making it operate one PTE
> at a time, thus potentially reducing the performance of the map
> operations.
>
> One option is to special case the PAGE_SIZE==12 case to keep the
> existing behaviour and operate in "reduced" (one PTE per call)
> mode in the other case but I dislike special casing like that (bitrots,
> one code path gets untested/unfixed, etc...)
>
> Another option would be to make struct nouveau_mem *mem ->pages always
> be an array of 4k (ie, create a breakdown when constructing that list),
> but I have yet to figure out what the impact would be (where that stuff
> gets created) and whether this is realistic at all...
>
>  - struct nouveau_mem is called "node" in various places (in
> nouveau_ttm) which is confusing. What is the relationship between
> nouveau_mem, nouveau_mm and nouveau_mm_node ? nouveau_mem seems to be
> having things such as "offset" in multiple of PAGE_SIZE, but have
a
> "page_shift" member that appears to match the buffer object
page_shift,
> hence seem to indicate that it is a card page shift...
>
> Would it be possible, maybe, to add comments next to the fields in
> those various data structure indicating in which units they are ?
>
>  - Similar confusion arises with things like struct ttm_mem_reg *mem.
> For example, in nouveau_ttm.c, I see statements like:
>
> 	ret = nouveau_vm_get(man->priv, mem->num_pages << 12,
node->page_shift,
> 			     NV_MEM_ACCESS_RW, &node->vma[0]);
>
> Which seems to indicate that mem->num_pages is in multiple of 4k always,
> though I would have though that a ttm object was in multiple of
> PAGE_SIZE, am I wrong ?
>
> Especially since the same object is later populated using:
>
> 	mem->start   = node->vma[0].offset >> PAGE_SHIFT;
>
> So the "start" member is in units of PAGE_SHIFT here, not 12.
>
> So I'm still a bit confused :-)
>The fun has been doubled because TTM expects PAGE units, so some of the
PAGE_SHIFT's are
genuine. Some may be a result of PAGE_SHIFT == 12, so honestly I don't know
the specific ones.

Apparently Analagous Threads

Search for more possibly parallel threads

Nouveau - Aug 2013 - Fixing nouveau for >4k PAGE_SIZE

[Nouveau] Fixing nouveau for >4k PAGE_SIZE

[Nouveau] Fixing nouveau for >4k PAGE_SIZE

[Nouveau] Fixing nouveau for >4k PAGE_SIZE

Apparently Analagous Threads