Alexandre Courbot
2014-Jul-16 08:20 UTC
[Nouveau] Questions about the Nouveau VM subsystem
Hi everyone, I have been studing the VM code in order to come with a better implementation of my fix for large pages usage on GK20A (https://lkml.org/lkml/2014/6/3/375 ). Following some older threads about this code (http://marc.info/?l=dri-devel&m=137618326022935 ) also left me kind of puzzled, and I wonder if we could not simplify and improve things there. It seems like arbitrary constant values for page_shifts (usually 12) are used where the vmm->spg_shift value should be used instead, and some other members are used out of place. Here are a few questions to start: - What is nouveau_mem::page_shift exactly used for? It is set here and there but never really used, and usually the value it is set to does not seem to bother how the memory is allocated. For instance nouveau_vram_manager_new() will set it to whatever page_shift the BO has. I *thought* it should indicate the size and alignment of the pages described in the nouveau_mem object. Is this definition correct? In that case, shouldn't the memory allocator be the one that sets that field, instead of the callers? - In nouveau_vm_map*(), why are we using vma->node->type to decide whether to use the small or big pages table? It seems like this should rather be decided by mem->page_shift, and more generally by how contiguous the physical memory is, in order to maximize large pages use whenever possible. Right now the issue that we have is that if the memory allocator returns 4k pages, and the VMA used to map this memory uses 128k pages, we are mapping 4k pages at 128k interval, which cannot end well. Generally speaking it seems like the VM subsystem allows plenty of specific cases to fail silently, and since I need to fix a few to allow GK20A to operate I could as well try to fix them all. Answers to the questions above and general opinions about the state of this code would help me understand what needs to be done. :) Thanks, Alex.
On Wed, Jul 16, 2014 at 6:20 PM, Alexandre Courbot <acourbot at nvidia.com> wrote:> Hi everyone, > > I have been studing the VM code in order to come with a better > implementation of my fix for large pages usage on GK20A > (https://lkml.org/lkml/2014/6/3/375 ). Following some older threads about > this code (http://marc.info/?l=dri-devel&m=137618326022935 ) also left me > kind of puzzled, and I wonder if we could not simplify and improve things > there.The VM code is quite a bit unloved, and has lots of areas that can/needs to be improved at some point. Let me see if I can wrap my head around / remember what/why some things are the way they are.> > It seems like arbitrary constant values for page_shifts (usually 12) are > used where the vmm->spg_shift value should be used instead, and some other > members are used out of place. Here are a few questions to start: > > - What is nouveau_mem::page_shift exactly used for? It is set here and there > but never really used, and usually the value it is set to does not seem to > bother how the memory is allocated. For instance nouveau_vram_manager_new() > will set it to whatever page_shift the BO has. I *thought* it should > indicate the size and alignment of the pages described in the nouveau_mem > object. Is this definition correct? In that case, shouldn't the memory > allocator be the one that sets that field, instead of the callers?Firstly, the caller decides on whether or not it wants to use large pages (currently the heuristic is "is the buffer >256KiB) instead of the allocator because we have to keep the GPU virtual address of buffers constant. On Tesla, there's no dual page-tables like on Fermi and up, one has do decide whether an entire range is going to be large or small pages ahead of time. The way things work currently is that when userspace allocates a BO, it specifies the memory spaces where it's OK to use the buffer from. If the caller says (VRAM && GART) on Tesla, then we will allocate small pages only so that if TTM kicks the buffer out to system memory (which we probably can't guarantee we'll get a whole heap of 64KiB pages), we can still render to/from it at the same virtual address, without having to move it back to VRAM first. If (VRAM && size > 256KiB) we'll ask for large pages, but then never be able to use it without migrating back to VRAM as necessary. Ok. So. A nouveau_bo generally has a single nouveau_mem associated with it, *except* when TTM is doing a buffer move, a second one appears for the destination of the move, and replaces the first when it's finished. Again, the source and destination page sizes may differ, so the value can't be used from nouveau_bo. You can think of nouveau_bo::page_shift as being "the page size the buffer will have when it's available to be used as a source/destination", nouveau_mem::page_shift is the actual size for whatever allocation is backing the nouveau_bo at that time. I hope that made sense. It's entirely possible I failed at explaining :P> > - In nouveau_vm_map*(), why are we using vma->node->type to decide whether > to use the small or big pages table? It seems like this should rather be > decided by mem->page_shift, and more generally by how contiguous the > physical memory is, in order to maximize large pages use whenever possible. > Right now the issue that we have is that if the memory allocator returns 4k > pages, and the VMA used to map this memory uses 128k pages, we are mapping > 4k pages at 128k interval, which cannot end well.Because vma->node->type maps to what was requested in the nouveau_vm_get() call to allocate address space. We should, perhaps, add assertions to make sure that when map() is called that the nouveau_mem::page_shift does indeed match. I did briefly think we should just pass nouveau_mem into vm_get instead of size+page_shift, but, there are cases where we want to allocate address space but don't yet have memory to back it. So, that's a non-starter. Most, but not all of the "12" actually refers to the granularity of the nouveau_mm that's created in vm_create(), and nothing to do with either the system or gpu page size. This was a very large fail on my part to not use a define for it, and as noticed, there are a couple of places where it's messed up.. Loads of stuff to be improved there. It's been a matter of "there's been worse stuff to get to first" until now, where it's actually a problem for you guys. Hopefully that helped somewhat, Ben.> > Generally speaking it seems like the VM subsystem allows plenty of specific > cases to fail silently, and since I need to fix a few to allow GK20A to > operate I could as well try to fix them all. Answers to the questions above > and general opinions about the state of this code would help me understand > what needs to be done. :) > > Thanks, > Alex. > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/nouveau