Alistair Popple
2021-Feb-09 01:07 UTC
[Nouveau] [PATCH 0/9] Add support for SVM atomics in Nouveau
This series adds support to Nouveau for atomic memory operations on OpenCL shared virtual memory (SVM). This is achieved using the atomic PTE bits on the GPU to only permit atomic operations to system memory when a page is not mapped in userspace on the CPU. This is implemented by adding a mode to migrate_vma_pages() which unmaps and isolates existing pages from the CPU and pins them. The original userspace page table entries are migrated to point to device private pages allocated by the driver. This allows the driver to enable GPU atomic access to the page as it will receive a callback when CPU userspace needs to access it. In response to this callback the driver revokes the atomic access permission from the GPU and migrates entries to point back to the original page. The original page is unpinned as part of the migration operation which also returns it to the LRU. Patch 3 contains the bulk of the memory management changes to implement unmap and pin. Patches 6-9 extend Nouveau to use the new mode to allow system wide atomics for OpenCL SVM to be implemented on Nouveau. This has been tested using the latest upstream Mesa userspace with a simple OpenCL test program which checks the results of atomic GPU operations on a buffer whilst also writing to the same buffer from the CPU. Problems yet to be addressed: Recent changes to pin_user_pages() prevent the creation of pinned pages in ZONE_MOVABLE. This series allows pinned pages to be created in ZONE_MOVABLE as attempts to migrate may fail which would be fatal to userspace. In this case migration of the pinned page is unnecessary as the page can be unpinned at anytime by having the driver revoke atomic permission as it does for the migrate_to_ram() callback. However a method of calling this when memory needs to be moved has yet to be resolved so any discussion is welcome. Alistair Popple (9): mm/migrate.c: Always allow device private pages to migrate mm/migrate.c: Allow pfn flags to be passed to migrate_vma_setup() mm/migrate: Add a unmap and pin migration mode Documentation: Add unmap and pin to HMM hmm-tests: Add test for unmap and pin nouveau/dmem: Only map migrating pages nouveau/svm: Refactor nouveau_range_fault nouveau/dmem: Add support for multiple page types nouveau/svm: Implement atomic SVM access Documentation/vm/hmm.rst | 22 +- arch/powerpc/kvm/book3s_hv_uvmem.c | 4 +- drivers/gpu/drm/nouveau/include/nvif/if000c.h | 1 + drivers/gpu/drm/nouveau/nouveau_dmem.c | 190 +++++++++++++++--- drivers/gpu/drm/nouveau/nouveau_dmem.h | 9 + drivers/gpu/drm/nouveau/nouveau_svm.c | 148 +++++++++++--- drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h | 1 + .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 6 + include/linux/migrate.h | 2 + include/linux/migrate_mode.h | 1 + lib/test_hmm.c | 109 ++++++++-- lib/test_hmm_uapi.h | 1 + mm/migrate.c | 82 +++++--- tools/testing/selftests/vm/hmm-tests.c | 49 +++++ 14 files changed, 524 insertions(+), 101 deletions(-) -- 2.20.1
Alistair Popple
2021-Feb-09 01:07 UTC
[Nouveau] [PATCH 1/9] mm/migrate.c: Always allow device private pages to migrate
Device private pages are used to represent device memory that is not directly accessible from the CPU. Extra references to a device private page are only used to ensure the struct page itself remains valid whilst waiting for migration entries. Therefore extra references should not prevent device private page migration as this can lead to failures to migrate pages back to the CPU which are fatal to the user process. Signed-off-by: Alistair Popple <apopple at nvidia.com> --- mm/migrate.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 20ca887ea769..053228559fd3 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -405,8 +405,13 @@ int migrate_page_move_mapping(struct address_space *mapping, int nr = thp_nr_pages(page); if (!mapping) { - /* Anonymous page without mapping */ - if (page_count(page) != expected_count) + /* + * Anonymous page without mapping. Device private pages should + * never have extra references except during migration, but it + * is safe to ignore these. + */ + if (!is_device_private_page(page) && + page_count(page) != expected_count) return -EAGAIN; /* No turning back from here */ -- 2.20.1
Alistair Popple
2021-Feb-09 01:07 UTC
[Nouveau] [PATCH 2/9] mm/migrate.c: Allow pfn flags to be passed to migrate_vma_setup()
Currently migrate_vma_setup() zeros both src and dst pfn arrays. This means it is not possible to pass per-pfn flags to migrate_vma_setup(). A future patch introduces per-pfn flags for migrate_vma_setup(), so ensure existing callers will not be affected by having the caller zero both src and dst pfn arrays. Signed-off-by: Alistair Popple <apopple at nvidia.com> --- arch/powerpc/kvm/book3s_hv_uvmem.c | 4 ++-- lib/test_hmm.c | 6 ++++-- mm/migrate.c | 1 - 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index 84e5a2dc8be5..d434783b272a 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -506,7 +506,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long end, unsigned long page_shift, struct kvm *kvm, unsigned long gpa) { - unsigned long src_pfn, dst_pfn = 0; + unsigned long src_pfn = 0, dst_pfn = 0; struct migrate_vma mig; struct page *dpage, *spage; struct kvmppc_uvmem_page_pvt *pvt; @@ -732,7 +732,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long page_shift, bool pagein) { - unsigned long src_pfn, dst_pfn = 0; + unsigned long src_pfn = 0, dst_pfn = 0; struct migrate_vma mig; struct page *spage; unsigned long pfn; diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 80a78877bd93..98848b96ff09 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -696,6 +696,8 @@ static int dmirror_migrate(struct dmirror *dmirror, if (next > vma->vm_end) next = vma->vm_end; + memset(src_pfns, 0, ARRAY_SIZE(src_pfns)); + memset(dst_pfns, 0, ARRAY_SIZE(dst_pfns)); args.vma = vma; args.src = src_pfns; args.dst = dst_pfns; @@ -1025,8 +1027,8 @@ static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args, static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf) { struct migrate_vma args; - unsigned long src_pfns; - unsigned long dst_pfns; + unsigned long src_pfns = 0; + unsigned long dst_pfns = 0; struct page *rpage; struct dmirror *dmirror; vm_fault_t ret; diff --git a/mm/migrate.c b/mm/migrate.c index 053228559fd3..fe8bb322e2e3 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2874,7 +2874,6 @@ int migrate_vma_setup(struct migrate_vma *args) if (!args->src || !args->dst) return -EINVAL; - memset(args->src, 0, sizeof(*args->src) * nr_pages); args->cpages = 0; args->npages = 0; -- 2.20.1
Alistair Popple
2021-Feb-09 01:07 UTC
[Nouveau] [PATCH 3/9] mm/migrate: Add a unmap and pin migration mode
Some drivers need to ensure that a device has access to a particular user page whilst preventing userspace access to that page. For example this is required to allow a driver to implement atomic access to a page when the device hardware does not support atomic access to system memory. This could be implemented by migrating the data to the device, however this is not always optimal and may fail in some circumstances. In these cases it is advantageous to remap the page for device access without actually migrating the data. To allow this kind of access introduce an unmap and pin flag called MIGRATE_PFN_PIN/UNPIN for migration pfns. This will cause the original page to be remapped to the provided device private page as normal, but instead of returning or freeing the original CPU page it will pin it and leave it isolated from the LRU. This ensures the page remains pinned so that a device may access it exclusively. Any userspace CPU accesses will fault and trigger the normal device private migrate_to_ram() callback which must migrate the mapping back to the original page, after which the device will no longer have exclusive access to the page. As the original page does not get freed it is safe to allow the unmap and pin operation to proceed in cases where there are extra page references present. Signed-off-by: Alistair Popple <apopple at nvidia.com> --- include/linux/migrate.h | 2 + include/linux/migrate_mode.h | 1 + mm/migrate.c | 74 +++++++++++++++++++++++++----------- 3 files changed, 54 insertions(+), 23 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 4594838a0f7c..449fc61f9a99 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -144,6 +144,8 @@ static inline int migrate_misplaced_transhuge_page(struct mm_struct *mm, #define MIGRATE_PFN_MIGRATE (1UL << 1) #define MIGRATE_PFN_LOCKED (1UL << 2) #define MIGRATE_PFN_WRITE (1UL << 3) +#define MIGRATE_PFN_PIN (1UL << 4) +#define MIGRATE_PFN_UNPIN (1UL << 4) #define MIGRATE_PFN_SHIFT 6 static inline struct page *migrate_pfn_to_page(unsigned long mpfn) diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h index 883c99249033..823497eda927 100644 --- a/include/linux/migrate_mode.h +++ b/include/linux/migrate_mode.h @@ -17,6 +17,7 @@ enum migrate_mode { MIGRATE_SYNC_LIGHT, MIGRATE_SYNC, MIGRATE_SYNC_NO_COPY, + MIGRATE_REFERENCED, }; #endif /* MIGRATE_MODE_H_INCLUDED */ diff --git a/mm/migrate.c b/mm/migrate.c index fe8bb322e2e3..71edc2679c8e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -410,7 +410,7 @@ int migrate_page_move_mapping(struct address_space *mapping, * never have extra references except during migration, but it * is safe to ignore these. */ - if (!is_device_private_page(page) && + if (!is_device_private_page(page) && extra_count >= 0 && page_count(page) != expected_count) return -EAGAIN; @@ -421,6 +421,8 @@ int migrate_page_move_mapping(struct address_space *mapping, __SetPageSwapBacked(newpage); return MIGRATEPAGE_SUCCESS; + } else if (extra_count < 0) { + return -EINVAL; } oldzone = page_zone(page); @@ -704,12 +706,15 @@ int migrate_page(struct address_space *mapping, BUG_ON(PageWriteback(page)); /* Writeback must be complete */ - rc = migrate_page_move_mapping(mapping, newpage, page, 0); + if (mode == MIGRATE_REFERENCED) + rc = migrate_page_move_mapping(mapping, newpage, page, -1); + else + rc = migrate_page_move_mapping(mapping, newpage, page, 0); if (rc != MIGRATEPAGE_SUCCESS) return rc; - if (mode != MIGRATE_SYNC_NO_COPY) + if (mode != MIGRATE_SYNC_NO_COPY && mode != MIGRATE_REFERENCED) migrate_page_copy(newpage, page); else migrate_page_states(newpage, page); @@ -2327,15 +2332,15 @@ static int migrate_vma_collect_hole(unsigned long start, if (!vma_is_anonymous(walk->vma)) { for (addr = start; addr < end; addr += PAGE_SIZE) { migrate->src[migrate->npages] = 0; - migrate->dst[migrate->npages] = 0; migrate->npages++; } return 0; } for (addr = start; addr < end; addr += PAGE_SIZE) { - migrate->src[migrate->npages] = MIGRATE_PFN_MIGRATE; - migrate->dst[migrate->npages] = 0; + if (vma_is_anonymous(walk->vma) && + !(migrate->src[migrate->npages] & MIGRATE_PFN_PIN)) + migrate->src[migrate->npages] = MIGRATE_PFN_MIGRATE; migrate->npages++; migrate->cpages++; } @@ -2425,7 +2430,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, pte = *ptep; if (pte_none(pte)) { - if (vma_is_anonymous(vma)) { + if (vma_is_anonymous(vma) && + !(migrate->src[migrate->npages] & MIGRATE_PFN_PIN)) { mpfn = MIGRATE_PFN_MIGRATE; migrate->cpages++; } @@ -2525,8 +2531,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, } next: - migrate->dst[migrate->npages] = 0; - migrate->src[migrate->npages++] = mpfn; + migrate->src[migrate->npages++] |= mpfn; } arch_leave_lazy_mmu_mode(); pte_unmap_unlock(ptep - 1, ptl); @@ -2695,7 +2700,13 @@ static void migrate_vma_prepare(struct migrate_vma *migrate) put_page(page); } - if (!migrate_vma_check_page(page)) { + /* + * If the page is being unmapped and pinned it isn't actually + * going to migrate, so it's safe to continue the operation with + * an elevated refcount. + */ + if (!migrate_vma_check_page(page) && + !(migrate->src[i] & MIGRATE_PFN_PIN)) { if (remap) { migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; migrate->cpages--; @@ -2757,25 +2768,34 @@ static void migrate_vma_unmap(struct migrate_vma *migrate) if (!page || !(migrate->src[i] & MIGRATE_PFN_MIGRATE)) continue; - if (page_mapped(page)) { + if (page_mapped(page)) try_to_unmap(page, flags); - if (page_mapped(page)) - goto restore; + + if (page_mapped(page)) + migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; + + if (!migrate_vma_check_page(page) && + !(migrate->src[i] & MIGRATE_PFN_PIN)) + migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; + + if (migrate->src[i] & MIGRATE_PFN_PIN) { + if (page_maybe_dma_pinned(page)) + migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; + else + page_ref_add(page, GUP_PIN_COUNTING_BIAS); } - if (migrate_vma_check_page(page)) + if (!(migrate->src[i] & MIGRATE_PFN_MIGRATE)) { + migrate->cpages--; + restore++; continue; - -restore: - migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; - migrate->cpages--; - restore++; + } } for (addr = start, i = 0; i < npages && restore; addr += PAGE_SIZE, i++) { struct page *page = migrate_pfn_to_page(migrate->src[i]); - if (!page || (migrate->src[i] & MIGRATE_PFN_MIGRATE)) + if (!page || (migrate->src[i] & MIGRATE_PFN_MIGRATE)) continue; remove_migration_ptes(page, page, false); @@ -3092,7 +3112,11 @@ void migrate_vma_pages(struct migrate_vma *migrate) } } - r = migrate_page(mapping, newpage, page, MIGRATE_SYNC_NO_COPY); + if (migrate->src[i] & MIGRATE_PFN_PIN) + r = migrate_page(mapping, newpage, page, MIGRATE_REFERENCED); + else + r = migrate_page(mapping, newpage, page, MIGRATE_SYNC_NO_COPY); + if (r != MIGRATEPAGE_SUCCESS) migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; } @@ -3148,15 +3172,19 @@ void migrate_vma_finalize(struct migrate_vma *migrate) if (is_zone_device_page(page)) put_page(page); - else + else if (!(migrate->src[i] & MIGRATE_PFN_PIN)) putback_lru_page(page); if (newpage != page) { unlock_page(newpage); if (is_zone_device_page(newpage)) put_page(newpage); - else + else { + if (migrate->dst[i] & MIGRATE_PFN_UNPIN) + page_ref_sub(newpage, GUP_PIN_COUNTING_BIAS); + putback_lru_page(newpage); + } } } } -- 2.20.1
Alistair Popple
2021-Feb-09 01:07 UTC
[Nouveau] [PATCH 4/9] Documentation: Add unmap and pin to HMM
Update the HMM documentation to include information on the unmap and pin operation. Signed-off-by: Alistair Popple <apopple at nvidia.com> --- Documentation/vm/hmm.rst | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst index 09e28507f5b2..83234984f656 100644 --- a/Documentation/vm/hmm.rst +++ b/Documentation/vm/hmm.rst @@ -346,7 +346,15 @@ between device driver specific code and shared common code: from the LRU (if system memory since device private pages are not on the LRU), unmapped from the process, and a special migration PTE is inserted in place of the original PTE. - migrate_vma_setup() also clears the ``args->dst`` array. + + A device driver may also initialise ``src`` entries with the + ``MIGRATE_PFN_PIN`` flag. This allows the device driver to unmap and pin + the existing system page in place whilst migrating page metadata to a + device private page. This leaves the page isolated from the LRU and gives + the device exclusive access to the page data without the need to migrate + data as any CPU access will trigger a fault. The device driver needs to + keep track of the ``src`` page as it effectively becomes the owner of + the page and needs to pass it in when remapping and unpinning the page. 3. The device driver allocates destination pages and copies source pages to destination pages. @@ -357,8 +365,8 @@ between device driver specific code and shared common code: array for that page. The driver then allocates either a device private struct page or a - system memory page, locks the page with ``lock_page()``, and fills in the - ``dst`` array entry with:: + system memory page, locks the page with ``lock_page()``, and fills in + the ``dst`` array entry with:: dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; @@ -373,6 +381,14 @@ between device driver specific code and shared common code: destination or clear the destination device private memory if the pointer is ``NULL`` meaning the source page was not populated in system memory. + Alternatively a driver that is remapping and unpinning a source page + obtained from a ``MIGRATE_PFN_PIN`` operation should lock the original + source page and pass it in along with the ``MIGRATE_PFN_UNPIN`` flag + without any need to copy data:: + + dst[i] = migrate_pfn(page_to_pfn(spage)) | MIGRATE_PFN_LOCKED + | MIGRATE_PFN_UNPIN; + 4. ``migrate_vma_pages()`` This step is where the migration is actually "committed". -- 2.20.1
Alistair Popple
2021-Feb-09 01:07 UTC
[Nouveau] [PATCH 5/9] hmm-tests: Add test for unmap and pin
Adds a basic test of the HMM unmap and pin operation. Signed-off-by: Alistair Popple <apopple at nvidia.com> --- lib/test_hmm.c | 107 +++++++++++++++++++++---- lib/test_hmm_uapi.h | 1 + tools/testing/selftests/vm/hmm-tests.c | 49 +++++++++++ 3 files changed, 140 insertions(+), 17 deletions(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 98848b96ff09..c78a473250a3 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -46,6 +46,7 @@ struct dmirror_bounce { unsigned long cpages; }; +#define DPT_XA_TAG_ATOMIC 1UL #define DPT_XA_TAG_WRITE 3UL /* @@ -83,6 +84,7 @@ struct dmirror_device { struct cdev cdevice; struct hmm_devmem *devmem; + unsigned int devmem_faults; unsigned int devmem_capacity; unsigned int devmem_count; struct dmirror_chunk **devmem_chunks; @@ -203,8 +205,18 @@ static void dmirror_do_update(struct dmirror *dmirror, unsigned long start, * Therefore, it is OK to just clear the entry. */ xa_for_each_range(&dmirror->pt, pfn, entry, start >> PAGE_SHIFT, - end >> PAGE_SHIFT) + end >> PAGE_SHIFT) { + /* + * Typically this would be done in devmap free page, but as + * we're using the XArray to store the reference to the original + * page do it here as it doesn't matter if clean up of the + * pinned page is delayed. + */ + if (xa_pointer_tag(entry) == DPT_XA_TAG_ATOMIC) + unpin_user_page(xa_untag_pointer(entry)); + xa_erase(&dmirror->pt, pfn); + } } static bool dmirror_interval_invalidate(struct mmu_interval_notifier *mni, @@ -571,7 +583,8 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice) } static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args, - struct dmirror *dmirror) + struct dmirror *dmirror, + int allow_ref) { struct dmirror_device *mdevice = dmirror->mdevice; const unsigned long *src = args->src; @@ -598,9 +611,17 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args, continue; rpage = dpage->zone_device_data; - if (spage) + if (spage && !(*src & MIGRATE_PFN_PIN)) copy_highpage(rpage, spage); else + /* + * In the MIGRATE_PFN_PIN case we don't really + * need rpage at all because the existing page is + * staying in place and will be mapped. However we need + * somewhere to store dmirror and that place is + * rpage->zone_device_data so we keep it for + * simplicity. + */ clear_highpage(rpage); /* @@ -620,7 +641,8 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args, } static int dmirror_migrate_finalize_and_map(struct migrate_vma *args, - struct dmirror *dmirror) + struct dmirror *dmirror, + int allow_ref) { unsigned long start = args->start; unsigned long end = args->end; @@ -647,8 +669,14 @@ static int dmirror_migrate_finalize_and_map(struct migrate_vma *args, * Store the page that holds the data so the page table * doesn't have to deal with ZONE_DEVICE private pages. */ - entry = dpage->zone_device_data; - if (*dst & MIGRATE_PFN_WRITE) + if (*src & MIGRATE_PFN_PIN) + entry = migrate_pfn_to_page(*src); + else + entry = dpage->zone_device_data; + + if (*src & MIGRATE_PFN_PIN) + entry = xa_tag_pointer(entry, DPT_XA_TAG_ATOMIC); + else if (*dst & MIGRATE_PFN_WRITE) entry = xa_tag_pointer(entry, DPT_XA_TAG_WRITE); entry = xa_store(&dmirror->pt, pfn, entry, GFP_ATOMIC); if (xa_is_err(entry)) { @@ -662,7 +690,8 @@ static int dmirror_migrate_finalize_and_map(struct migrate_vma *args, } static int dmirror_migrate(struct dmirror *dmirror, - struct hmm_dmirror_cmd *cmd) + struct hmm_dmirror_cmd *cmd, + int allow_ref) { unsigned long start, end, addr; unsigned long size = cmd->npages << PAGE_SHIFT; @@ -673,7 +702,7 @@ static int dmirror_migrate(struct dmirror *dmirror, struct dmirror_bounce bounce; struct migrate_vma args; unsigned long next; - int ret; + int i, ret; start = cmd->addr; end = start + size; @@ -696,8 +725,13 @@ static int dmirror_migrate(struct dmirror *dmirror, if (next > vma->vm_end) next = vma->vm_end; - memset(src_pfns, 0, ARRAY_SIZE(src_pfns)); - memset(dst_pfns, 0, ARRAY_SIZE(dst_pfns)); + if (allow_ref) + for (i = 0; i < 64; ++i) + src_pfns[i] = MIGRATE_PFN_PIN; + else + memset(src_pfns, 0, sizeof(src_pfns)); + memset(dst_pfns, 0, sizeof(dst_pfns)); + args.vma = vma; args.src = src_pfns; args.dst = dst_pfns; @@ -709,9 +743,9 @@ static int dmirror_migrate(struct dmirror *dmirror, if (ret) goto out; - dmirror_migrate_alloc_and_copy(&args, dmirror); + dmirror_migrate_alloc_and_copy(&args, dmirror, allow_ref); migrate_vma_pages(&args); - dmirror_migrate_finalize_and_map(&args, dmirror); + dmirror_migrate_finalize_and_map(&args, dmirror, allow_ref); migrate_vma_finalize(&args); } mmap_read_unlock(mm); @@ -739,6 +773,28 @@ static int dmirror_migrate(struct dmirror *dmirror, return ret; } +static int dmirror_migrate_pin(struct dmirror *dmirror, + struct hmm_dmirror_cmd *cmd) +{ + void *tmp; + int nr_pages = cmd->npages; + int ret; + + ret = dmirror_migrate(dmirror, cmd, true); + + tmp = kmalloc(nr_pages << PAGE_SHIFT, GFP_KERNEL); + if (!tmp) + return -ENOMEM; + + /* Make sure user access faults */ + dmirror->mdevice->devmem_faults = 0; + if (copy_from_user(tmp, u64_to_user_ptr(cmd->addr), nr_pages << PAGE_SHIFT)) + ret = -EFAULT; + cmd->faults = dmirror->mdevice->devmem_faults; + + return ret; +} + static void dmirror_mkentry(struct dmirror *dmirror, struct hmm_range *range, unsigned char *perm, unsigned long entry) { @@ -948,7 +1004,11 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp, break; case HMM_DMIRROR_MIGRATE: - ret = dmirror_migrate(dmirror, &cmd); + ret = dmirror_migrate(dmirror, &cmd, false); + break; + + case HMM_DMIRROR_MIGRATE_PIN: + ret = dmirror_migrate_pin(dmirror, &cmd); break; case HMM_DMIRROR_SNAPSHOT: @@ -1004,20 +1064,31 @@ static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args, for (addr = start; addr < end; addr += PAGE_SIZE, src++, dst++) { struct page *dpage, *spage; + void *entry; spage = migrate_pfn_to_page(*src); if (!spage || !(*src & MIGRATE_PFN_MIGRATE)) continue; - spage = spage->zone_device_data; - dpage = alloc_page_vma(GFP_HIGHUSER_MOVABLE, args->vma, addr); + entry = xa_load(&dmirror->pt, addr >> PAGE_SHIFT); + if (entry && xa_pointer_tag(entry) == DPT_XA_TAG_ATOMIC) { + spage = NULL; + dpage = xa_untag_pointer(entry); + *dst = migrate_pfn(page_to_pfn(dpage)) | + MIGRATE_PFN_LOCKED | MIGRATE_PFN_UNPIN; + } else { + spage = spage->zone_device_data; + dpage = alloc_page_vma(GFP_HIGHUSER_MOVABLE, args->vma, addr); + *dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + } + if (!dpage) continue; lock_page(dpage); xa_erase(&dmirror->pt, addr >> PAGE_SHIFT); - copy_highpage(dpage, spage); - *dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + if (spage) + copy_highpage(dpage, spage); if (*src & MIGRATE_PFN_WRITE) *dst |= MIGRATE_PFN_WRITE; } @@ -1041,6 +1112,8 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf) rpage = vmf->page->zone_device_data; dmirror = rpage->zone_device_data; + dmirror->mdevice->devmem_faults++; + /* FIXME demonstrate how we can adjust migrate range */ args.vma = vmf->vma; args.start = vmf->address; diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h index 670b4ef2a5b6..b40f4e6affe0 100644 --- a/lib/test_hmm_uapi.h +++ b/lib/test_hmm_uapi.h @@ -33,6 +33,7 @@ struct hmm_dmirror_cmd { #define HMM_DMIRROR_WRITE _IOWR('H', 0x01, struct hmm_dmirror_cmd) #define HMM_DMIRROR_MIGRATE _IOWR('H', 0x02, struct hmm_dmirror_cmd) #define HMM_DMIRROR_SNAPSHOT _IOWR('H', 0x03, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_MIGRATE_PIN _IOWR('H', 0x04, struct hmm_dmirror_cmd) /* * Values returned in hmm_dmirror_cmd.ptr for HMM_DMIRROR_SNAPSHOT. diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c index 5d1ac691b9f4..7111ebab93c7 100644 --- a/tools/testing/selftests/vm/hmm-tests.c +++ b/tools/testing/selftests/vm/hmm-tests.c @@ -947,6 +947,55 @@ TEST_F(hmm, migrate_fault) hmm_buffer_free(buffer); } +TEST_F(hmm, migrate_fault_pin) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + int *ptr; + int ret; + + npages = ALIGN(HMM_BUFFER_SIZE, self->page_size) >> self->page_shift; + ASSERT_NE(npages, 0); + size = npages << self->page_shift; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = size; + buffer->mirror = malloc(size); + ASSERT_NE(buffer->mirror, NULL); + + buffer->ptr = mmap(NULL, size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, + buffer->fd, 0); + ASSERT_NE(buffer->ptr, MAP_FAILED); + + /* Initialize buffer in system memory. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ptr[i] = i; + + /* Migrate memory to device. */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE_PIN, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], i); + + ASSERT_EQ(buffer->faults, npages); + + /* Fault pages back to system memory and check them. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], i); + + hmm_buffer_free(buffer); +} + /* * Migrate anonymous shared memory to device private memory. */ -- 2.20.1
Alistair Popple
2021-Feb-09 01:07 UTC
[Nouveau] [PATCH 6/9] nouveau/dmem: Only map migrating pages
Only pages which were actually migrated should be mapped on the GPU. migrate_vma_pages() clears MIGRATE_PFN_MIGRATE in the src_pfn array, so test this prior to mapping the pages on the GPU. If any pages failed to migrate don't install any mappings - the GPU will demand fault any as required. Signed-off-by: Alistair Popple <apopple at nvidia.com> --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 92987daa5e17..9579bd001f11 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -618,8 +618,9 @@ static void nouveau_dmem_migrate_chunk(struct nouveau_drm *drm, dma_addr_t *dma_addrs, u64 *pfns) { struct nouveau_fence *fence; - unsigned long addr = args->start, nr_dma = 0, i; + unsigned long addr = args->start, nr_dma = 0, i, npages; + npages = (args->start - args->end) >> PAGE_SHIFT; for (i = 0; addr < args->end; i++) { args->dst[i] = nouveau_dmem_migrate_copy_one(drm, svmm, args->src[i], dma_addrs + nr_dma, pfns + i); @@ -631,7 +632,17 @@ static void nouveau_dmem_migrate_chunk(struct nouveau_drm *drm, nouveau_fence_new(drm->dmem->migrate.chan, false, &fence); migrate_vma_pages(args); nouveau_dmem_fence_done(&fence); - nouveau_pfns_map(svmm, args->vma->vm_mm, args->start, pfns, i); + + for (i = 0; i < npages; i++) + if (!(args->src[i] & MIGRATE_PFN_MIGRATE)) + break; + + /* + * If all pages were migrated successfully map them on the GPU. If any + * failed just let the GPU fault to create the mapping. + */ + if (i == npages) + nouveau_pfns_map(svmm, args->vma->vm_mm, args->start, pfns, npages); while (nr_dma--) { dma_unmap_page(drm->dev->dev, dma_addrs[nr_dma], PAGE_SIZE, -- 2.20.1
Alistair Popple
2021-Feb-09 01:07 UTC
[Nouveau] [PATCH 7/9] nouveau/svm: Refactor nouveau_range_fault
Call mmu_interval_notifier_insert() as part of nouveau_range_fault(). This doesn't introduce any functional change but makes it easier for a subsequent patch to alter the behaviour of nouveau_range_fault() to support GPU atomic operations. Signed-off-by: Alistair Popple <apopple at nvidia.com> --- drivers/gpu/drm/nouveau/nouveau_svm.c | 34 ++++++++++++++++----------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index 1c3f890377d2..63332387402e 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -567,18 +567,27 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm, unsigned long hmm_pfns[1]; struct hmm_range range = { .notifier = ¬ifier->notifier, - .start = notifier->notifier.interval_tree.start, - .end = notifier->notifier.interval_tree.last + 1, .default_flags = hmm_flags, .hmm_pfns = hmm_pfns, .dev_private_owner = drm->dev, }; - struct mm_struct *mm = notifier->notifier.mm; + struct mm_struct *mm = svmm->notifier.mm; int ret; + ret = mmu_interval_notifier_insert(¬ifier->notifier, mm, + args->p.addr, args->p.size, + &nouveau_svm_mni_ops); + if (ret) + return ret; + + range.start = notifier->notifier.interval_tree.start; + range.end = notifier->notifier.interval_tree.last + 1; + while (true) { - if (time_after(jiffies, timeout)) - return -EBUSY; + if (time_after(jiffies, timeout)) { + ret = -EBUSY; + goto out; + } range.notifier_seq = mmu_interval_read_begin(range.notifier); mmap_read_lock(mm); @@ -587,7 +596,7 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm, if (ret) { if (ret == -EBUSY) continue; - return ret; + goto out; } mutex_lock(&svmm->mutex); @@ -606,6 +615,9 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm, svmm->vmm->vmm.object.client->super = false; mutex_unlock(&svmm->mutex); +out: + mmu_interval_notifier_remove(¬ifier->notifier); + return ret; } @@ -727,14 +739,8 @@ nouveau_svm_fault(struct nvif_notify *notify) } notifier.svmm = svmm; - ret = mmu_interval_notifier_insert(¬ifier.notifier, mm, - args.i.p.addr, args.i.p.size, - &nouveau_svm_mni_ops); - if (!ret) { - ret = nouveau_range_fault(svmm, svm->drm, &args.i, - sizeof(args), hmm_flags, ¬ifier); - mmu_interval_notifier_remove(¬ifier.notifier); - } + ret = nouveau_range_fault(svmm, svm->drm, &args.i, + sizeof(args), hmm_flags, ¬ifier); mmput(mm); limit = args.i.p.addr + args.i.p.size; -- 2.20.1
Alistair Popple
2021-Feb-09 01:07 UTC
[Nouveau] [PATCH 8/9] nouveau/dmem: Add support for multiple page types
Device private pages are used to track a per-page migrate_to_ram() callback which is called when the CPU attempts to access a GPU page from the CPU. Currently the same callback is used for all GPU pages tracked by Nouveau. However a future patch requires support for calling a different callback when accessing some GPU pages. This patch extends the existing Nouveau device private page allocator to make it easier to allocate device private pages with different callbacks but should not introduce any functional changes. Signed-off-by: Alistair Popple <apopple at nvidia.com> --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 27 ++++++++++++++------------ drivers/gpu/drm/nouveau/nouveau_dmem.h | 5 +++++ 2 files changed, 20 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 9579bd001f11..8fb4949f3778 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -67,6 +67,7 @@ struct nouveau_dmem_chunk { struct nouveau_bo *bo; struct nouveau_drm *drm; unsigned long callocated; + enum nouveau_dmem_type type; struct dev_pagemap pagemap; }; @@ -81,7 +82,7 @@ struct nouveau_dmem { struct nouveau_dmem_migrate migrate; struct list_head chunks; struct mutex mutex; - struct page *free_pages; + struct page *free_pages[NOUVEAU_DMEM_NTYPES]; spinlock_t lock; }; @@ -112,8 +113,8 @@ static void nouveau_dmem_page_free(struct page *page) struct nouveau_dmem *dmem = chunk->drm->dmem; spin_lock(&dmem->lock); - page->zone_device_data = dmem->free_pages; - dmem->free_pages = page; + page->zone_device_data = dmem->free_pages[chunk->type]; + dmem->free_pages[chunk->type] = page; WARN_ON(!chunk->callocated); chunk->callocated--; @@ -224,7 +225,8 @@ static const struct dev_pagemap_ops nouveau_dmem_pagemap_ops = { }; static int -nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage) +nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage, + enum nouveau_dmem_type type) { struct nouveau_dmem_chunk *chunk; struct resource *res; @@ -248,6 +250,7 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage) } chunk->drm = drm; + chunk->type = type; chunk->pagemap.type = MEMORY_DEVICE_PRIVATE; chunk->pagemap.range.start = res->start; chunk->pagemap.range.end = res->end; @@ -279,8 +282,8 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage) page = pfn_to_page(pfn_first); spin_lock(&drm->dmem->lock); for (i = 0; i < DMEM_CHUNK_NPAGES - 1; ++i, ++page) { - page->zone_device_data = drm->dmem->free_pages; - drm->dmem->free_pages = page; + page->zone_device_data = drm->dmem->free_pages[type]; + drm->dmem->free_pages[type] = page; } *ppage = page; chunk->callocated++; @@ -304,22 +307,22 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage) } static struct page * -nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm) +nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, enum nouveau_dmem_type type) { struct nouveau_dmem_chunk *chunk; struct page *page = NULL; int ret; spin_lock(&drm->dmem->lock); - if (drm->dmem->free_pages) { - page = drm->dmem->free_pages; - drm->dmem->free_pages = page->zone_device_data; + if (drm->dmem->free_pages[type]) { + page = drm->dmem->free_pages[type]; + drm->dmem->free_pages[type] = page->zone_device_data; chunk = nouveau_page_to_chunk(page); chunk->callocated++; spin_unlock(&drm->dmem->lock); } else { spin_unlock(&drm->dmem->lock); - ret = nouveau_dmem_chunk_alloc(drm, &page); + ret = nouveau_dmem_chunk_alloc(drm, &page, type); if (ret) return NULL; } @@ -577,7 +580,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm, if (!(src & MIGRATE_PFN_MIGRATE)) goto out; - dpage = nouveau_dmem_page_alloc_locked(drm); + dpage = nouveau_dmem_page_alloc_locked(drm, NOUVEAU_DMEM); if (!dpage) goto out; diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.h b/drivers/gpu/drm/nouveau/nouveau_dmem.h index 64da5d3635c8..02e261c4acf1 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.h +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.h @@ -28,6 +28,11 @@ struct nouveau_drm; struct nouveau_svmm; struct hmm_range; +enum nouveau_dmem_type { + NOUVEAU_DMEM, + NOUVEAU_DMEM_NTYPES, /* Number of types, must be last */ +}; + #if IS_ENABLED(CONFIG_DRM_NOUVEAU_SVM) void nouveau_dmem_init(struct nouveau_drm *); void nouveau_dmem_fini(struct nouveau_drm *); -- 2.20.1
Alistair Popple
2021-Feb-09 01:07 UTC
[Nouveau] [PATCH 9/9] nouveau/svm: Implement atomic SVM access
Some NVIDIA GPUs do not support direct atomic access to system memory via PCIe. Instead this must be emulated by granting the GPU exclusive access to the memory. This is achieved by migrating the userspace mappings to device private pages whilst leaving the actual page in place. The driver then grants the GPU permission to update the page undergoing atomic access via the GPU page tables. When CPU access to the page is required a CPU fault is raised which calls into the device driver via migrate_to_ram() to remap the page into the CPU page tables and revoke GPU access. Signed-off-by: Alistair Popple <apopple at nvidia.com> --- drivers/gpu/drm/nouveau/include/nvif/if000c.h | 1 + drivers/gpu/drm/nouveau/nouveau_dmem.c | 148 ++++++++++++++++-- drivers/gpu/drm/nouveau/nouveau_dmem.h | 4 + drivers/gpu/drm/nouveau/nouveau_svm.c | 116 ++++++++++++-- drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h | 1 + .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 6 + 6 files changed, 249 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/nouveau/include/nvif/if000c.h b/drivers/gpu/drm/nouveau/include/nvif/if000c.h index d6dd40f21eed..9c7ff56831c5 100644 --- a/drivers/gpu/drm/nouveau/include/nvif/if000c.h +++ b/drivers/gpu/drm/nouveau/include/nvif/if000c.h @@ -77,6 +77,7 @@ struct nvif_vmm_pfnmap_v0 { #define NVIF_VMM_PFNMAP_V0_APER 0x00000000000000f0ULL #define NVIF_VMM_PFNMAP_V0_HOST 0x0000000000000000ULL #define NVIF_VMM_PFNMAP_V0_VRAM 0x0000000000000010ULL +#define NVIF_VMM_PFNMAP_V0_A 0x0000000000000004ULL #define NVIF_VMM_PFNMAP_V0_W 0x0000000000000002ULL #define NVIF_VMM_PFNMAP_V0_V 0x0000000000000001ULL #define NVIF_VMM_PFNMAP_V0_NONE 0x0000000000000000ULL diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 8fb4949f3778..7b103670af56 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -69,6 +69,7 @@ struct nouveau_dmem_chunk { unsigned long callocated; enum nouveau_dmem_type type; struct dev_pagemap pagemap; + struct page **atomic_pages; }; struct nouveau_dmem_migrate { @@ -107,12 +108,51 @@ unsigned long nouveau_dmem_page_addr(struct page *page) return chunk->bo->offset + off; } +static struct page *nouveau_dpage_to_atomic(struct page *dpage) +{ + struct nouveau_dmem_chunk *chunk = nouveau_page_to_chunk(dpage); + int index; + + if (WARN_ON_ONCE(chunk->type != NOUVEAU_ATOMIC)) + return NULL; + + index = page_to_pfn(dpage) - (chunk->pagemap.range.start >> PAGE_SHIFT); + + if (WARN_ON_ONCE(index > DMEM_CHUNK_NPAGES)) + return NULL; + + return chunk->atomic_pages[index]; +} + +void nouveau_dmem_set_atomic(struct page *dpage, struct page *atomic_page) +{ + struct nouveau_dmem_chunk *chunk = nouveau_page_to_chunk(dpage); + int index; + + if (WARN_ON_ONCE(chunk->type != NOUVEAU_ATOMIC)) + return; + + index = page_to_pfn(dpage) - (chunk->pagemap.range.start >> PAGE_SHIFT); + + if (WARN_ON_ONCE(index > DMEM_CHUNK_NPAGES)) + return; + + chunk->atomic_pages[index] = atomic_page; +} + static void nouveau_dmem_page_free(struct page *page) { struct nouveau_dmem_chunk *chunk = nouveau_page_to_chunk(page); struct nouveau_dmem *dmem = chunk->drm->dmem; spin_lock(&dmem->lock); + if (chunk->type == NOUVEAU_ATOMIC) { + struct page *atomic_page; + + atomic_page = nouveau_dpage_to_atomic(page); + if (atomic_page) + unpin_user_page(atomic_page); + } page->zone_device_data = dmem->free_pages[chunk->type]; dmem->free_pages[chunk->type] = page; @@ -125,6 +165,65 @@ static void nouveau_dmem_page_free(struct page *page) spin_unlock(&dmem->lock); } +static vm_fault_t nouveau_atomic_page_migrate(struct vm_fault *vmf) +{ + struct nouveau_drm *drm = page_to_drm(vmf->page); + struct page *dpage = vmf->page; + struct nouveau_svmm *svmm = dpage->zone_device_data; + struct migrate_vma args; + unsigned long src_pfn = 0, dst_pfn = 0; + struct page *src_page, *old_page; + int retry; + int ret = 0; + + args.vma = vmf->vma; + args.src = &src_pfn; + args.dst = &dst_pfn; + args.start = vmf->address; + args.end = vmf->address + PAGE_SIZE; + args.pgmap_owner = drm->dev; + args.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; + + for (retry = 0; retry < 10; retry++) { + ret = migrate_vma_setup(&args); + if (ret) { + ret = VM_FAULT_SIGBUS; + goto out; + } + + if (src_pfn & MIGRATE_PFN_MIGRATE) + break; + + cond_resched(); + } + + src_page = migrate_pfn_to_page(src_pfn); + old_page = nouveau_dpage_to_atomic(dpage); + + /* + * If we can't get a lock on the atomic page it means another thread is + * racing or migrating so retry the fault. + */ + if (src_page && trylock_page(old_page)) + dst_pfn = migrate_pfn(page_to_pfn(old_page)) | + MIGRATE_PFN_LOCKED | MIGRATE_PFN_UNPIN; + else + ret = VM_FAULT_RETRY; + + migrate_vma_pages(&args); + if (src_pfn & MIGRATE_PFN_MIGRATE) { + nouveau_svmm_invalidate(svmm, args.start, args.end); + nouveau_dmem_set_atomic(dpage, NULL); + } + migrate_vma_finalize(&args); + +out: + if (ret == VM_FAULT_RETRY) + mmap_read_unlock(vmf->vma->vm_mm); + + return ret; +} + static void nouveau_dmem_fence_done(struct nouveau_fence **fence) { if (fence) { @@ -224,6 +323,11 @@ static const struct dev_pagemap_ops nouveau_dmem_pagemap_ops = { .migrate_to_ram = nouveau_dmem_migrate_to_ram, }; +static const struct dev_pagemap_ops nouveau_atomic_pagemap_ops = { + .page_free = nouveau_dmem_page_free, + .migrate_to_ram = nouveau_atomic_page_migrate, +}; + static int nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage, enum nouveau_dmem_type type) @@ -255,18 +359,30 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage, chunk->pagemap.range.start = res->start; chunk->pagemap.range.end = res->end; chunk->pagemap.nr_range = 1; - chunk->pagemap.ops = &nouveau_dmem_pagemap_ops; + if (type == NOUVEAU_DMEM) + chunk->pagemap.ops = &nouveau_dmem_pagemap_ops; + else + chunk->pagemap.ops = &nouveau_atomic_pagemap_ops; chunk->pagemap.owner = drm->dev; - ret = nouveau_bo_new(&drm->client, DMEM_CHUNK_SIZE, 0, - NOUVEAU_GEM_DOMAIN_VRAM, 0, 0, NULL, NULL, - &chunk->bo); - if (ret) - goto out_release; + if (type == NOUVEAU_DMEM) { + ret = nouveau_bo_new(&drm->client, DMEM_CHUNK_SIZE, 0, + NOUVEAU_GEM_DOMAIN_VRAM, 0, 0, NULL, NULL, + &chunk->bo); + if (ret) + goto out_release; - ret = nouveau_bo_pin(chunk->bo, NOUVEAU_GEM_DOMAIN_VRAM, false); - if (ret) - goto out_bo_free; + ret = nouveau_bo_pin(chunk->bo, NOUVEAU_GEM_DOMAIN_VRAM, false); + if (ret) + goto out_bo_free; + } else { + chunk->atomic_pages = kcalloc(DMEM_CHUNK_NPAGES, + sizeof(void *), GFP_KERNEL); + if (!chunk->atomic_pages) { + ret = -ENOMEM; + goto out_release; + } + } ptr = memremap_pages(&chunk->pagemap, numa_node_id()); if (IS_ERR(ptr)) { @@ -289,8 +405,8 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage, chunk->callocated++; spin_unlock(&drm->dmem->lock); - NV_INFO(drm, "DMEM: registered %ldMB of device memory\n", - DMEM_CHUNK_SIZE >> 20); + NV_INFO(drm, "DMEM: registered %ldMB of device %smemory\n", + DMEM_CHUNK_SIZE >> 20, type == NOUVEAU_ATOMIC ? "atomic " : ""); return 0; @@ -306,7 +422,7 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage, return ret; } -static struct page * +struct page * nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, enum nouveau_dmem_type type) { struct nouveau_dmem_chunk *chunk; @@ -382,8 +498,12 @@ nouveau_dmem_fini(struct nouveau_drm *drm) mutex_lock(&drm->dmem->mutex); list_for_each_entry_safe(chunk, tmp, &drm->dmem->chunks, list) { - nouveau_bo_unpin(chunk->bo); - nouveau_bo_ref(NULL, &chunk->bo); + if (chunk->type == NOUVEAU_DMEM) { + nouveau_bo_unpin(chunk->bo); + nouveau_bo_ref(NULL, &chunk->bo); + } else { + kfree(chunk->atomic_pages); + } list_del(&chunk->list); memunmap_pages(&chunk->pagemap); release_mem_region(chunk->pagemap.range.start, diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.h b/drivers/gpu/drm/nouveau/nouveau_dmem.h index 02e261c4acf1..6b52a7a8dea4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.h +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.h @@ -30,6 +30,7 @@ struct hmm_range; enum nouveau_dmem_type { NOUVEAU_DMEM, + NOUVEAU_ATOMIC, NOUVEAU_DMEM_NTYPES, /* Number of types, must be last */ }; @@ -45,6 +46,9 @@ int nouveau_dmem_migrate_vma(struct nouveau_drm *drm, unsigned long start, unsigned long end); unsigned long nouveau_dmem_page_addr(struct page *page); +struct page *nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, + enum nouveau_dmem_type type); +void nouveau_dmem_set_atomic(struct page *dpage, struct page *atomic_page); #else /* IS_ENABLED(CONFIG_DRM_NOUVEAU_SVM) */ static inline void nouveau_dmem_init(struct nouveau_drm *drm) {} diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index 63332387402e..e571c6907bfd 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -35,6 +35,7 @@ #include <linux/sched/mm.h> #include <linux/sort.h> #include <linux/hmm.h> +#include <linux/idr.h> struct nouveau_svm { struct nouveau_drm *drm; @@ -421,9 +422,9 @@ nouveau_svm_fault_cmp(const void *a, const void *b) return ret; if ((ret = (s64)fa->addr - fb->addr)) return ret; - /*XXX: atomic? */ - return (fa->access == 0 || fa->access == 3) - - (fb->access == 0 || fb->access == 3); + /* Atomic access (2) has highest priority */ + return (-1*(fa->access == 2) + (fa->access == 0 || fa->access == 3)) - + (-1*(fb->access == 2) + (fb->access == 0 || fb->access == 3)); } static void @@ -555,10 +556,86 @@ static void nouveau_hmm_convert_pfn(struct nouveau_drm *drm, args->p.phys[0] |= NVIF_VMM_PFNMAP_V0_W; } +static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, + struct nouveau_drm *drm, + struct nouveau_pfnmap_args *args, u32 size, + unsigned long hmm_flags, struct mm_struct *mm) +{ + struct page *dpage, *oldpage; + struct migrate_vma migrate_args; + unsigned long src_pfn = MIGRATE_PFN_PIN, dst_pfn = 0, start = args->p.addr; + int ret = 0; + + mmap_read_lock(mm); + + migrate_args.src = &src_pfn; + migrate_args.dst = &dst_pfn; + migrate_args.start = start; + migrate_args.end = migrate_args.start + PAGE_SIZE; + migrate_args.pgmap_owner = NULL; + migrate_args.flags = MIGRATE_VMA_SELECT_SYSTEM; + migrate_args.vma = find_vma_intersection(mm, migrate_args.start, migrate_args.end); + + ret = migrate_vma_setup(&migrate_args); + + if (ret) { + SVMM_ERR(svmm, "Unable to setup atomic page migration"); + ret = -ENOMEM; + goto out; + } + + oldpage = migrate_pfn_to_page(src_pfn); + + if (src_pfn & MIGRATE_PFN_MIGRATE) { + dpage = nouveau_dmem_page_alloc_locked(drm, NOUVEAU_ATOMIC); + if (!dpage) { + SVMM_ERR(svmm, "Unable to allocate atomic page"); + *migrate_args.dst = 0; + } else { + nouveau_dmem_set_atomic(dpage, oldpage); + dpage->zone_device_data = svmm; + *migrate_args.dst = migrate_pfn(page_to_pfn(dpage)) | + MIGRATE_PFN_LOCKED; + } + } + + migrate_vma_pages(&migrate_args); + + /* Map the page on the GPU for successfully migrated mappings. Do this + * before removing the migration PTEs in migrate_vma_finalize() as once + * that happens a CPU thread can race and invalidate the GPU mappings + * prior to mapping them here. + */ + if (src_pfn & MIGRATE_PFN_MIGRATE) { + args->p.page = 12; + args->p.size = PAGE_SIZE; + args->p.addr = start; + args->p.phys[0] = page_to_phys(oldpage) | + NVIF_VMM_PFNMAP_V0_V | + NVIF_VMM_PFNMAP_V0_A | + NVIF_VMM_PFNMAP_V0_HOST; + + if (migrate_args.vma->vm_flags & VM_WRITE) + args->p.phys[0] |= NVIF_VMM_PFNMAP_V0_W; + + mutex_lock(&svmm->mutex); + svmm->vmm->vmm.object.client->super = true; + ret = nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL); + svmm->vmm->vmm.object.client->super = false; + mutex_unlock(&svmm->mutex); + } + + migrate_vma_finalize(&migrate_args); + +out: + mmap_read_unlock(mm); + return ret; +} + static int nouveau_range_fault(struct nouveau_svmm *svmm, struct nouveau_drm *drm, struct nouveau_pfnmap_args *args, u32 size, - unsigned long hmm_flags, + unsigned long hmm_flags, int atomic, struct svm_notifier *notifier) { unsigned long timeout @@ -608,12 +685,18 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm, break; } - nouveau_hmm_convert_pfn(drm, &range, args); + if (atomic) { + mutex_unlock(&svmm->mutex); + ret = nouveau_atomic_range_fault(svmm, drm, args, + size, hmm_flags, mm); + } else { + nouveau_hmm_convert_pfn(drm, &range, args); - svmm->vmm->vmm.object.client->super = true; - ret = nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL); - svmm->vmm->vmm.object.client->super = false; - mutex_unlock(&svmm->mutex); + svmm->vmm->vmm.object.client->super = true; + ret = nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL); + svmm->vmm->vmm.object.client->super = false; + mutex_unlock(&svmm->mutex); + } out: mmu_interval_notifier_remove(¬ifier->notifier); @@ -637,7 +720,7 @@ nouveau_svm_fault(struct nvif_notify *notify) unsigned long hmm_flags; u64 inst, start, limit; int fi, fn; - int replay = 0, ret; + int replay = 0, atomic = 0, ret; /* Parse available fault buffer entries into a cache, and update * the GET pointer so HW can reuse the entries. @@ -718,12 +801,15 @@ nouveau_svm_fault(struct nvif_notify *notify) /* * Determine required permissions based on GPU fault * access flags. - * XXX: atomic? */ switch (buffer->fault[fi]->access) { case 0: /* READ. */ hmm_flags = HMM_PFN_REQ_FAULT; break; + case 2: /* ATOMIC. */ + hmm_flags = HMM_PFN_REQ_FAULT | HMM_PFN_REQ_WRITE; + atomic = true; + break; case 3: /* PREFETCH. */ hmm_flags = 0; break; @@ -740,7 +826,7 @@ nouveau_svm_fault(struct nvif_notify *notify) notifier.svmm = svmm; ret = nouveau_range_fault(svmm, svm->drm, &args.i, - sizeof(args), hmm_flags, ¬ifier); + sizeof(args), hmm_flags, atomic, ¬ifier); mmput(mm); limit = args.i.p.addr + args.i.p.size; @@ -760,7 +846,11 @@ nouveau_svm_fault(struct nvif_notify *notify) !(args.phys[0] & NVIF_VMM_PFNMAP_V0_V)) || (buffer->fault[fi]->access != 0 /* READ. */ && buffer->fault[fi]->access != 3 /* PREFETCH. */ && - !(args.phys[0] & NVIF_VMM_PFNMAP_V0_W))) + !(args.phys[0] & NVIF_VMM_PFNMAP_V0_W)) || + (buffer->fault[fi]->access != 0 /* READ. */ && + buffer->fault[fi]->access != 1 /* WRITE. */ && + buffer->fault[fi]->access != 3 /* PREFETCH. */ && + !(args.phys[0] & NVIF_VMM_PFNMAP_V0_A))) break; } diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h index a2b179568970..f6188aa9171c 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h @@ -178,6 +178,7 @@ void nvkm_vmm_unmap_region(struct nvkm_vmm *, struct nvkm_vma *); #define NVKM_VMM_PFN_APER 0x00000000000000f0ULL #define NVKM_VMM_PFN_HOST 0x0000000000000000ULL #define NVKM_VMM_PFN_VRAM 0x0000000000000010ULL +#define NVKM_VMM_PFN_A 0x0000000000000004ULL #define NVKM_VMM_PFN_W 0x0000000000000002ULL #define NVKM_VMM_PFN_V 0x0000000000000001ULL #define NVKM_VMM_PFN_NONE 0x0000000000000000ULL diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c index 236db5570771..f02abd9cb4dd 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c @@ -88,6 +88,9 @@ gp100_vmm_pgt_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt, if (!(*map->pfn & NVKM_VMM_PFN_W)) data |= BIT_ULL(6); /* RO. */ + if (!(*map->pfn & NVKM_VMM_PFN_A)) + data |= BIT_ULL(7); /* Atomic disable. */ + if (!(*map->pfn & NVKM_VMM_PFN_VRAM)) { addr = *map->pfn >> NVKM_VMM_PFN_ADDR_SHIFT; addr = dma_map_page(dev, pfn_to_page(addr), 0, @@ -322,6 +325,9 @@ gp100_vmm_pd0_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt, if (!(*map->pfn & NVKM_VMM_PFN_W)) data |= BIT_ULL(6); /* RO. */ + if (!(*map->pfn & NVKM_VMM_PFN_A)) + data |= BIT_ULL(7); /* Atomic disable. */ + if (!(*map->pfn & NVKM_VMM_PFN_VRAM)) { addr = *map->pfn >> NVKM_VMM_PFN_ADDR_SHIFT; addr = dma_map_page(dev, pfn_to_page(addr), 0, -- 2.20.1
Daniel Vetter
2021-Feb-09 10:27 UTC
[Nouveau] [PATCH 0/9] Add support for SVM atomics in Nouveau
On Tue, Feb 09, 2021 at 12:07:13PM +1100, Alistair Popple wrote:> This series adds support to Nouveau for atomic memory operations on OpenCL > shared virtual memory (SVM). This is achieved using the atomic PTE bits on > the GPU to only permit atomic operations to system memory when a page is > not mapped in userspace on the CPU. > > This is implemented by adding a mode to migrate_vma_pages() which unmaps > and isolates existing pages from the CPU and pins them. The original > userspace page table entries are migrated to point to device private pages > allocated by the driver. This allows the driver to enable GPU atomic access > to the page as it will receive a callback when CPU userspace needs to > access it. > > In response to this callback the driver revokes the atomic access > permission from the GPU and migrates entries to point back to the original > page. The original page is unpinned as part of the migration operation > which also returns it to the LRU. > > Patch 3 contains the bulk of the memory management changes to implement > unmap and pin. > > Patches 6-9 extend Nouveau to use the new mode to allow system wide atomics > for OpenCL SVM to be implemented on Nouveau. > > This has been tested using the latest upstream Mesa userspace with a simple > OpenCL test program which checks the results of atomic GPU operations on a > buffer whilst also writing to the same buffer from the CPU. > > Problems yet to be addressed: > > Recent changes to pin_user_pages() prevent the creation of pinned pages in > ZONE_MOVABLE. This series allows pinned pages to be created in ZONE_MOVABLE > as attempts to migrate may fail which would be fatal to userspace. > > In this case migration of the pinned page is unnecessary as the page can be > unpinned at anytime by having the driver revoke atomic permission as it > does for the migrate_to_ram() callback. However a method of calling this > when memory needs to be moved has yet to be resolved so any discussion is > welcome.Why do we need to pin for gpu atomics? You still have the callback for cpu faults, so you can move the page as needed, and hence a long-term pin sounds like the wrong approach. That would avoid all the hacking around long term pin constraints, because for real unmoveable long term pinned memory we really want to have all these checks. So I think we might be missing some other callbacks to be able to move these pages, instead of abusing longterm pins for lack of better tools. Cheers, Daniel> > Alistair Popple (9): > mm/migrate.c: Always allow device private pages to migrate > mm/migrate.c: Allow pfn flags to be passed to migrate_vma_setup() > mm/migrate: Add a unmap and pin migration mode > Documentation: Add unmap and pin to HMM > hmm-tests: Add test for unmap and pin > nouveau/dmem: Only map migrating pages > nouveau/svm: Refactor nouveau_range_fault > nouveau/dmem: Add support for multiple page types > nouveau/svm: Implement atomic SVM access > > Documentation/vm/hmm.rst | 22 +- > arch/powerpc/kvm/book3s_hv_uvmem.c | 4 +- > drivers/gpu/drm/nouveau/include/nvif/if000c.h | 1 + > drivers/gpu/drm/nouveau/nouveau_dmem.c | 190 +++++++++++++++--- > drivers/gpu/drm/nouveau/nouveau_dmem.h | 9 + > drivers/gpu/drm/nouveau/nouveau_svm.c | 148 +++++++++++--- > drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h | 1 + > .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 6 + > include/linux/migrate.h | 2 + > include/linux/migrate_mode.h | 1 + > lib/test_hmm.c | 109 ++++++++-- > lib/test_hmm_uapi.h | 1 + > mm/migrate.c | 82 +++++--- > tools/testing/selftests/vm/hmm-tests.c | 49 +++++ > 14 files changed, 524 insertions(+), 101 deletions(-) > > -- > 2.20.1 >-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch