Christoph Hellwig
2020-Mar-16 19:32 UTC
[Nouveau] ensure device private pages have an owner v2
When acting on device private mappings a driver needs to know if the device (or other entity in case of kvmppc) actually owns this private mapping. This series adds an owner field and converts the migrate_vma code over to check it. I looked into doing the same for hmm_range_fault, but as far as I can tell that code has never been wired up to actually work for device private memory, so instead of trying to fix some unused code the second patch just remove the code. We can add it back once we have a working and fully tested code, and then should pass the expected owner in the hmm_range structure. Changes since v1: - split out the pgmap->owner addition into a separate patch - check pgmap->owner is set for device private mappings - rename the dev_private_owner field in struct migrate_vma to src_owner - refuse to migrate private pages if src_owner is not set - keep the non-fault device private handling in hmm_range_fault
Christoph Hellwig
2020-Mar-16 19:32 UTC
[Nouveau] [PATCH 1/4] memremap: add an owner field to struct dev_pagemap
Add a new opaque owner field to struct dev_pagemap, which will allow the hmm and migrate_vma code to identify who owns ZONE_DEVICE memory, and refuse to work on mappings not owned by the calling entity. Signed-off-by: Christoph Hellwig <hch at lst.de> --- arch/powerpc/kvm/book3s_hv_uvmem.c | 2 ++ drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 + include/linux/memremap.h | 4 ++++ mm/memremap.c | 4 ++++ 4 files changed, 11 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index 79b1202b1c62..67fefb03b9b7 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -779,6 +779,8 @@ int kvmppc_uvmem_init(void) kvmppc_uvmem_pgmap.type = MEMORY_DEVICE_PRIVATE; kvmppc_uvmem_pgmap.res = *res; kvmppc_uvmem_pgmap.ops = &kvmppc_uvmem_ops; + /* just one global instance: */ + kvmppc_uvmem_pgmap.owner = &kvmppc_uvmem_pgmap; addr = memremap_pages(&kvmppc_uvmem_pgmap, NUMA_NO_NODE); if (IS_ERR(addr)) { ret = PTR_ERR(addr); diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 0ad5d87b5a8e..a4682272586e 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -526,6 +526,7 @@ nouveau_dmem_init(struct nouveau_drm *drm) drm->dmem->pagemap.type = MEMORY_DEVICE_PRIVATE; drm->dmem->pagemap.res = *res; drm->dmem->pagemap.ops = &nouveau_dmem_pagemap_ops; + drm->dmem->pagemap.owner = drm->dev; if (IS_ERR(devm_memremap_pages(device, &drm->dmem->pagemap))) goto out_free; diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 6fefb09af7c3..60d97e8fd3c0 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -103,6 +103,9 @@ struct dev_pagemap_ops { * @type: memory type: see MEMORY_* in memory_hotplug.h * @flags: PGMAP_* flags to specify defailed behavior * @ops: method table + * @owner: an opaque pointer identifying the entity that manages this + * instance. Used by various helpers to make sure that no + * foreign ZONE_DEVICE memory is accessed. */ struct dev_pagemap { struct vmem_altmap altmap; @@ -113,6 +116,7 @@ struct dev_pagemap { enum memory_type type; unsigned int flags; const struct dev_pagemap_ops *ops; + void *owner; }; static inline struct vmem_altmap *pgmap_altmap(struct dev_pagemap *pgmap) diff --git a/mm/memremap.c b/mm/memremap.c index 09b5b7adc773..9b2c97ceb775 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -181,6 +181,10 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) WARN(1, "Missing migrate_to_ram method\n"); return ERR_PTR(-EINVAL); } + if (!pgmap->owner) { + WARN(1, "Missing owner\n"); + return ERR_PTR(-EINVAL); + } break; case MEMORY_DEVICE_FS_DAX: if (!IS_ENABLED(CONFIG_ZONE_DEVICE) || -- 2.24.1
Christoph Hellwig
2020-Mar-16 19:32 UTC
[Nouveau] [PATCH 2/4] mm: handle multiple owners of device private pages in migrate_vma
Add a new src_owner field to struct migrate_vma. If the field is set, only device private pages with page->pgmap->owner equal to that field are migrated. If the field is not set only "normal" pages are migrated. Signed-off-by: Christoph Hellwig <hch at lst.de> Fixes: df6ad69838fc ("mm/device-public-memory: device memory cache coherent with CPU") --- arch/powerpc/kvm/book3s_hv_uvmem.c | 1 + drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 + include/linux/migrate.h | 8 ++++++++ mm/migrate.c | 9 ++++++--- 4 files changed, 16 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index 67fefb03b9b7..f44f6b27950f 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -563,6 +563,7 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start, mig.end = end; mig.src = &src_pfn; mig.dst = &dst_pfn; + mig.src_owner = &kvmppc_uvmem_pgmap; mutex_lock(&kvm->arch.uvmem_lock); /* The requested page is already paged-out, nothing to do */ diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index a4682272586e..0e36345d395c 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -176,6 +176,7 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) .end = vmf->address + PAGE_SIZE, .src = &src, .dst = &dst, + .src_owner = drm->dev, }; /* diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 72120061b7d4..3e546cbf03dd 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -196,6 +196,14 @@ struct migrate_vma { unsigned long npages; unsigned long start; unsigned long end; + + /* + * Set to the owner value also stored in page->pgmap->owner for + * migrating out of device private memory. If set only device + * private pages with this owner are migrated. If not set + * device private pages are not migrated at all. + */ + void *src_owner; }; int migrate_vma_setup(struct migrate_vma *args); diff --git a/mm/migrate.c b/mm/migrate.c index b1092876e537..7605d2c23433 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2241,7 +2241,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, arch_enter_lazy_mmu_mode(); for (; addr < end; addr += PAGE_SIZE, ptep++) { - unsigned long mpfn, pfn; + unsigned long mpfn = 0, pfn; struct page *page; swp_entry_t entry; pte_t pte; @@ -2255,8 +2255,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, } if (!pte_present(pte)) { - mpfn = 0; - /* * Only care about unaddressable device page special * page table entry. Other special swap entries are not @@ -2267,11 +2265,16 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, goto next; page = device_private_entry_to_page(entry); + if (page->pgmap->owner != migrate->src_owner) + goto next; + mpfn = migrate_pfn(page_to_pfn(page)) | MIGRATE_PFN_MIGRATE; if (is_write_device_private_entry(entry)) mpfn |= MIGRATE_PFN_WRITE; } else { + if (migrate->src_owner) + goto next; pfn = pte_pfn(pte); if (is_zero_pfn(pfn)) { mpfn = MIGRATE_PFN_MIGRATE; -- 2.24.1
Christoph Hellwig
2020-Mar-16 19:32 UTC
[Nouveau] [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault
Remove the code to fault device private pages back into system memory that has never been used by any driver. Also replace the usage of the HMM_PFN_DEVICE_PRIVATE flag in the pfns array with a simple is_device_private_page check in nouveau. Signed-off-by: Christoph Hellwig <hch at lst.de> --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 - drivers/gpu/drm/nouveau/nouveau_dmem.c | 5 +++-- drivers/gpu/drm/nouveau/nouveau_svm.c | 1 - include/linux/hmm.h | 2 -- mm/hmm.c | 25 +++++-------------------- 5 files changed, 8 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index dee446278417..90821ce5e6ca 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -776,7 +776,6 @@ struct amdgpu_ttm_tt { static const uint64_t hmm_range_flags[HMM_PFN_FLAG_MAX] = { (1 << 0), /* HMM_PFN_VALID */ (1 << 1), /* HMM_PFN_WRITE */ - 0 /* HMM_PFN_DEVICE_PRIVATE */ }; static const uint64_t hmm_range_values[HMM_PFN_VALUE_MAX] = { diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 0e36345d395c..edfd0805fba4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -28,6 +28,7 @@ #include <nvif/class.h> #include <nvif/object.h> +#include <nvif/if000c.h> #include <nvif/if500b.h> #include <nvif/if900b.h> @@ -692,9 +693,8 @@ nouveau_dmem_convert_pfn(struct nouveau_drm *drm, if (page == NULL) continue; - if (!(range->pfns[i] & range->flags[HMM_PFN_DEVICE_PRIVATE])) { + if (!is_device_private_page(page)) continue; - } if (!nouveau_dmem_page(drm, page)) { WARN(1, "Some unknown device memory !\n"); @@ -705,5 +705,6 @@ nouveau_dmem_convert_pfn(struct nouveau_drm *drm, addr = nouveau_dmem_page_addr(page); range->pfns[i] &= ((1UL << range->pfn_shift) - 1); range->pfns[i] |= (addr >> PAGE_SHIFT) << range->pfn_shift; + range->pfns[i] |= NVIF_VMM_PFNMAP_V0_VRAM; } } diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index df9bf1fd1bc0..39c731a99937 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -367,7 +367,6 @@ static const u64 nouveau_svm_pfn_flags[HMM_PFN_FLAG_MAX] = { [HMM_PFN_VALID ] = NVIF_VMM_PFNMAP_V0_V, [HMM_PFN_WRITE ] = NVIF_VMM_PFNMAP_V0_W, - [HMM_PFN_DEVICE_PRIVATE] = NVIF_VMM_PFNMAP_V0_VRAM, }; static const u64 diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 4bf8d6997b12..5e6034f105c3 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -74,7 +74,6 @@ * Flags: * HMM_PFN_VALID: pfn is valid. It has, at least, read permission. * HMM_PFN_WRITE: CPU page table has write permission set - * HMM_PFN_DEVICE_PRIVATE: private device memory (ZONE_DEVICE) * * The driver provides a flags array for mapping page protections to device * PTE bits. If the driver valid bit for an entry is bit 3, @@ -86,7 +85,6 @@ enum hmm_pfn_flag_e { HMM_PFN_VALID = 0, HMM_PFN_WRITE, - HMM_PFN_DEVICE_PRIVATE, HMM_PFN_FLAG_MAX }; diff --git a/mm/hmm.c b/mm/hmm.c index 180e398170b0..cfad65f6a67b 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -118,15 +118,6 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk, /* We aren't ask to do anything ... */ if (!(pfns & range->flags[HMM_PFN_VALID])) return; - /* If this is device memory then only fault if explicitly requested */ - if ((cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) { - /* Do we fault on device memory ? */ - if (pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) { - *write_fault = pfns & range->flags[HMM_PFN_WRITE]; - *fault = true; - } - return; - } /* If CPU page table is not valid then we need to fault */ *fault = !(cpu_flags & range->flags[HMM_PFN_VALID]); @@ -260,21 +251,15 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, swp_entry_t entry = pte_to_swp_entry(pte); /* - * This is a special swap entry, ignore migration, use - * device and report anything else as error. + * Never fault in device private pages pages, but just report + * the PFN even if not present. */ if (is_device_private_entry(entry)) { - cpu_flags = range->flags[HMM_PFN_VALID] | - range->flags[HMM_PFN_DEVICE_PRIVATE]; - cpu_flags |= is_write_device_private_entry(entry) ? - range->flags[HMM_PFN_WRITE] : 0; - hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, - &fault, &write_fault); - if (fault || write_fault) - goto fault; *pfn = hmm_device_entry_from_pfn(range, swp_offset(entry)); - *pfn |= cpu_flags; + *pfn |= range->flags[HMM_PFN_VALID]; + if (is_write_device_private_entry(entry)) + *pfn |= range->flags[HMM_PFN_WRITE]; return 0; } -- 2.24.1
Christoph Hellwig
2020-Mar-16 19:32 UTC
[Nouveau] [PATCH 4/4] mm: check the device private page owner in hmm_range_fault
Hmm range fault will succeed for any kind of device private memory, even if it doesn't belong to the calling entity. While nouveau has some crude checks for that, they are broken because they assume nouveau is the only user of device private memory. Fix this by passing in an expected pgmap owner in the hmm_range_fault structure. Signed-off-by: Christoph Hellwig <hch at lst.de> Fixes: 4ef589dc9b10 ("mm/hmm/devmem: device memory hotplug using ZONE_DEVICE") --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 12 ------------ include/linux/hmm.h | 2 ++ mm/hmm.c | 10 +++++++++- 3 files changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index edfd0805fba4..ad89e09a0be3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -672,12 +672,6 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm, return ret; } -static inline bool -nouveau_dmem_page(struct nouveau_drm *drm, struct page *page) -{ - return is_device_private_page(page) && drm->dmem == page_to_dmem(page); -} - void nouveau_dmem_convert_pfn(struct nouveau_drm *drm, struct hmm_range *range) @@ -696,12 +690,6 @@ nouveau_dmem_convert_pfn(struct nouveau_drm *drm, if (!is_device_private_page(page)) continue; - if (!nouveau_dmem_page(drm, page)) { - WARN(1, "Some unknown device memory !\n"); - range->pfns[i] = 0; - continue; - } - addr = nouveau_dmem_page_addr(page); range->pfns[i] &= ((1UL << range->pfn_shift) - 1); range->pfns[i] |= (addr >> PAGE_SHIFT) << range->pfn_shift; diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 5e6034f105c3..bb6be4428633 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -132,6 +132,7 @@ enum hmm_pfn_value_e { * @pfn_flags_mask: allows to mask pfn flags so that only default_flags matter * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT) * @valid: pfns array did not change since it has been fill by an HMM function + * @dev_private_owner: owner of device private pages */ struct hmm_range { struct mmu_interval_notifier *notifier; @@ -144,6 +145,7 @@ struct hmm_range { uint64_t default_flags; uint64_t pfn_flags_mask; uint8_t pfn_shift; + void *dev_private_owner; }; /* diff --git a/mm/hmm.c b/mm/hmm.c index cfad65f6a67b..b75b3750e03d 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -216,6 +216,14 @@ int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr, unsigned long end, uint64_t *pfns, pmd_t pmd); #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static inline bool hmm_is_device_private_entry(struct hmm_range *range, + swp_entry_t entry) +{ + return is_device_private_entry(entry) && + device_private_entry_to_page(entry)->pgmap->owner =+ range->dev_private_owner; +} + static inline uint64_t pte_to_hmm_pfn_flags(struct hmm_range *range, pte_t pte) { if (pte_none(pte) || !pte_present(pte) || pte_protnone(pte)) @@ -254,7 +262,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, * Never fault in device private pages pages, but just report * the PFN even if not present. */ - if (is_device_private_entry(entry)) { + if (hmm_is_device_private_entry(range, entry)) { *pfn = hmm_device_entry_from_pfn(range, swp_offset(entry)); *pfn |= range->flags[HMM_PFN_VALID]; -- 2.24.1
Jason Gunthorpe
2020-Mar-16 19:49 UTC
[Nouveau] [PATCH 4/4] mm: check the device private page owner in hmm_range_fault
On Mon, Mar 16, 2020 at 08:32:16PM +0100, Christoph Hellwig wrote:> Hmm range fault will succeed for any kind of device private memory, > even if it doesn't belong to the calling entity. While nouveau > has some crude checks for that, they are broken because they assume > nouveau is the only user of device private memory. Fix this by > passing in an expected pgmap owner in the hmm_range_fault structure. > > Signed-off-by: Christoph Hellwig <hch at lst.de> > Fixes: 4ef589dc9b10 ("mm/hmm/devmem: device memory hotplug using ZONE_DEVICE") > --- > drivers/gpu/drm/nouveau/nouveau_dmem.c | 12 ------------ > include/linux/hmm.h | 2 ++ > mm/hmm.c | 10 +++++++++- > 3 files changed, 11 insertions(+), 13 deletions(-)Nice Reviewed-by: Jason Gunthorpe <jgg at mellanox.com> Jason
Jason Gunthorpe
2020-Mar-16 19:59 UTC
[Nouveau] [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault
On Mon, Mar 16, 2020 at 08:32:15PM +0100, Christoph Hellwig wrote:> diff --git a/mm/hmm.c b/mm/hmm.c > index 180e398170b0..cfad65f6a67b 100644 > +++ b/mm/hmm.c > @@ -118,15 +118,6 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk, > /* We aren't ask to do anything ... */ > if (!(pfns & range->flags[HMM_PFN_VALID])) > return; > - /* If this is device memory then only fault if explicitly requested */ > - if ((cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) { > - /* Do we fault on device memory ? */ > - if (pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) { > - *write_fault = pfns & range->flags[HMM_PFN_WRITE]; > - *fault = true; > - } > - return; > - }Yes, this is an elegant solution to the input flags. However, between patch 3 and 4 doesn't this break amd gpu as it will return device_private pages now if not requested? Squash the two? Jason
Ralph Campbell
2020-Mar-16 20:55 UTC
[Nouveau] [PATCH 1/4] memremap: add an owner field to struct dev_pagemap
On 3/16/20 12:32 PM, Christoph Hellwig wrote:> Add a new opaque owner field to struct dev_pagemap, which will allow > the hmm and migrate_vma code to identify who owns ZONE_DEVICE memory, > and refuse to work on mappings not owned by the calling entity. > > Signed-off-by: Christoph Hellwig <hch at lst.de>This looks like a reasonable approach to take. Reviewed-by: Ralph Campbell <rcampbell at nvidia.com>> --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 ++ > drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 + > include/linux/memremap.h | 4 ++++ > mm/memremap.c | 4 ++++ > 4 files changed, 11 insertions(+) > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > index 79b1202b1c62..67fefb03b9b7 100644 > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > @@ -779,6 +779,8 @@ int kvmppc_uvmem_init(void) > kvmppc_uvmem_pgmap.type = MEMORY_DEVICE_PRIVATE; > kvmppc_uvmem_pgmap.res = *res; > kvmppc_uvmem_pgmap.ops = &kvmppc_uvmem_ops; > + /* just one global instance: */ > + kvmppc_uvmem_pgmap.owner = &kvmppc_uvmem_pgmap; > addr = memremap_pages(&kvmppc_uvmem_pgmap, NUMA_NO_NODE); > if (IS_ERR(addr)) { > ret = PTR_ERR(addr); > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > index 0ad5d87b5a8e..a4682272586e 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > @@ -526,6 +526,7 @@ nouveau_dmem_init(struct nouveau_drm *drm) > drm->dmem->pagemap.type = MEMORY_DEVICE_PRIVATE; > drm->dmem->pagemap.res = *res; > drm->dmem->pagemap.ops = &nouveau_dmem_pagemap_ops; > + drm->dmem->pagemap.owner = drm->dev; > if (IS_ERR(devm_memremap_pages(device, &drm->dmem->pagemap))) > goto out_free; > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 6fefb09af7c3..60d97e8fd3c0 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -103,6 +103,9 @@ struct dev_pagemap_ops { > * @type: memory type: see MEMORY_* in memory_hotplug.h > * @flags: PGMAP_* flags to specify defailed behavior > * @ops: method table > + * @owner: an opaque pointer identifying the entity that manages this > + * instance. Used by various helpers to make sure that no > + * foreign ZONE_DEVICE memory is accessed. > */ > struct dev_pagemap { > struct vmem_altmap altmap; > @@ -113,6 +116,7 @@ struct dev_pagemap { > enum memory_type type; > unsigned int flags; > const struct dev_pagemap_ops *ops; > + void *owner; > }; > > static inline struct vmem_altmap *pgmap_altmap(struct dev_pagemap *pgmap) > diff --git a/mm/memremap.c b/mm/memremap.c > index 09b5b7adc773..9b2c97ceb775 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -181,6 +181,10 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) > WARN(1, "Missing migrate_to_ram method\n"); > return ERR_PTR(-EINVAL); > } > + if (!pgmap->owner) { > + WARN(1, "Missing owner\n"); > + return ERR_PTR(-EINVAL); > + } > break; > case MEMORY_DEVICE_FS_DAX: > if (!IS_ENABLED(CONFIG_ZONE_DEVICE) || >
Ralph Campbell
2020-Mar-16 21:43 UTC
[Nouveau] [PATCH 2/4] mm: handle multiple owners of device private pages in migrate_vma
On 3/16/20 12:32 PM, Christoph Hellwig wrote:> Add a new src_owner field to struct migrate_vma. If the field is set, > only device private pages with page->pgmap->owner equal to that field > are migrated. If the field is not set only "normal" pages are migrated. > > Signed-off-by: Christoph Hellwig <hch at lst.de> > Fixes: df6ad69838fc ("mm/device-public-memory: device memory cache coherent with CPU")When migrating to device private memory, setting the src_owner lets the caller know about source pages that are already migrated and skips pages migrated to a different device similar to pages swapped out an actual swap device. But, it prevents normal pages from being migrated to device private memory. It can still be useful for the driver to know that a page is owned by a different device if it has a device to device way of migrating data. nouveau_dmem_migrate_vma() isn't setting args.src_owner so only normal pages will be migrated which I guess is OK for now since nouveau doesn't handle direct GPU to GPU migration currently. When migrating device private memory to system memory due to a CPU fault, the source page should be the device's device private struct page so if it isn't, then it does make sense to not migrate whatever normal page is there. nouveau_dmem_migrate_to_ram() sets src_owner so this case looks OK. Just had to think this through. Reviewed-by: Ralph Campbell <rcampbell at nvidia.com>> --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 1 + > drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 + > include/linux/migrate.h | 8 ++++++++ > mm/migrate.c | 9 ++++++--- > 4 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > index 67fefb03b9b7..f44f6b27950f 100644 > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > @@ -563,6 +563,7 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start, > mig.end = end; > mig.src = &src_pfn; > mig.dst = &dst_pfn; > + mig.src_owner = &kvmppc_uvmem_pgmap; > > mutex_lock(&kvm->arch.uvmem_lock); > /* The requested page is already paged-out, nothing to do */ > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > index a4682272586e..0e36345d395c 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > @@ -176,6 +176,7 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) > .end = vmf->address + PAGE_SIZE, > .src = &src, > .dst = &dst, > + .src_owner = drm->dev, > }; > > /* > diff --git a/include/linux/migrate.h b/include/linux/migrate.h > index 72120061b7d4..3e546cbf03dd 100644 > --- a/include/linux/migrate.h > +++ b/include/linux/migrate.h > @@ -196,6 +196,14 @@ struct migrate_vma { > unsigned long npages; > unsigned long start; > unsigned long end; > + > + /* > + * Set to the owner value also stored in page->pgmap->owner for > + * migrating out of device private memory. If set only device > + * private pages with this owner are migrated. If not set > + * device private pages are not migrated at all. > + */ > + void *src_owner; > }; > > int migrate_vma_setup(struct migrate_vma *args); > diff --git a/mm/migrate.c b/mm/migrate.c > index b1092876e537..7605d2c23433 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2241,7 +2241,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > arch_enter_lazy_mmu_mode(); > > for (; addr < end; addr += PAGE_SIZE, ptep++) { > - unsigned long mpfn, pfn; > + unsigned long mpfn = 0, pfn; > struct page *page; > swp_entry_t entry; > pte_t pte; > @@ -2255,8 +2255,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > } > > if (!pte_present(pte)) { > - mpfn = 0; > - > /* > * Only care about unaddressable device page special > * page table entry. Other special swap entries are not > @@ -2267,11 +2265,16 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > goto next; > > page = device_private_entry_to_page(entry); > + if (page->pgmap->owner != migrate->src_owner) > + goto next; > + > mpfn = migrate_pfn(page_to_pfn(page)) | > MIGRATE_PFN_MIGRATE; > if (is_write_device_private_entry(entry)) > mpfn |= MIGRATE_PFN_WRITE; > } else { > + if (migrate->src_owner) > + goto next; > pfn = pte_pfn(pte); > if (is_zero_pfn(pfn)) { > mpfn = MIGRATE_PFN_MIGRATE; >
Ralph Campbell
2020-Mar-16 22:49 UTC
[Nouveau] [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault
On 3/16/20 12:32 PM, Christoph Hellwig wrote:> Remove the code to fault device private pages back into system memory > that has never been used by any driver. Also replace the usage of the > HMM_PFN_DEVICE_PRIVATE flag in the pfns array with a simple > is_device_private_page check in nouveau. > > Signed-off-by: Christoph Hellwig <hch at lst.de>Getting rid of HMM_PFN_DEVICE_PRIVATE seems reasonable to me since a driver can look at the struct page but what if a driver needs to fault in a page from another device's private memory? Should it call handle_mm_fault()?> --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 - > drivers/gpu/drm/nouveau/nouveau_dmem.c | 5 +++-- > drivers/gpu/drm/nouveau/nouveau_svm.c | 1 - > include/linux/hmm.h | 2 -- > mm/hmm.c | 25 +++++-------------------- > 5 files changed, 8 insertions(+), 26 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index dee446278417..90821ce5e6ca 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -776,7 +776,6 @@ struct amdgpu_ttm_tt { > static const uint64_t hmm_range_flags[HMM_PFN_FLAG_MAX] = { > (1 << 0), /* HMM_PFN_VALID */ > (1 << 1), /* HMM_PFN_WRITE */ > - 0 /* HMM_PFN_DEVICE_PRIVATE */ > }; > > static const uint64_t hmm_range_values[HMM_PFN_VALUE_MAX] = { > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > index 0e36345d395c..edfd0805fba4 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > @@ -28,6 +28,7 @@ > > #include <nvif/class.h> > #include <nvif/object.h> > +#include <nvif/if000c.h> > #include <nvif/if500b.h> > #include <nvif/if900b.h> > > @@ -692,9 +693,8 @@ nouveau_dmem_convert_pfn(struct nouveau_drm *drm, > if (page == NULL) > continue; > > - if (!(range->pfns[i] & range->flags[HMM_PFN_DEVICE_PRIVATE])) { > + if (!is_device_private_page(page)) > continue; > - } > > if (!nouveau_dmem_page(drm, page)) { > WARN(1, "Some unknown device memory !\n"); > @@ -705,5 +705,6 @@ nouveau_dmem_convert_pfn(struct nouveau_drm *drm, > addr = nouveau_dmem_page_addr(page); > range->pfns[i] &= ((1UL << range->pfn_shift) - 1); > range->pfns[i] |= (addr >> PAGE_SHIFT) << range->pfn_shift; > + range->pfns[i] |= NVIF_VMM_PFNMAP_V0_VRAM; > } > } > diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c > index df9bf1fd1bc0..39c731a99937 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_svm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c > @@ -367,7 +367,6 @@ static const u64 > nouveau_svm_pfn_flags[HMM_PFN_FLAG_MAX] = { > [HMM_PFN_VALID ] = NVIF_VMM_PFNMAP_V0_V, > [HMM_PFN_WRITE ] = NVIF_VMM_PFNMAP_V0_W, > - [HMM_PFN_DEVICE_PRIVATE] = NVIF_VMM_PFNMAP_V0_VRAM, > }; > > static const u64 > diff --git a/include/linux/hmm.h b/include/linux/hmm.h > index 4bf8d6997b12..5e6034f105c3 100644 > --- a/include/linux/hmm.h > +++ b/include/linux/hmm.h > @@ -74,7 +74,6 @@ > * Flags: > * HMM_PFN_VALID: pfn is valid. It has, at least, read permission. > * HMM_PFN_WRITE: CPU page table has write permission set > - * HMM_PFN_DEVICE_PRIVATE: private device memory (ZONE_DEVICE) > * > * The driver provides a flags array for mapping page protections to device > * PTE bits. If the driver valid bit for an entry is bit 3, > @@ -86,7 +85,6 @@ > enum hmm_pfn_flag_e { > HMM_PFN_VALID = 0, > HMM_PFN_WRITE, > - HMM_PFN_DEVICE_PRIVATE, > HMM_PFN_FLAG_MAX > }; > > diff --git a/mm/hmm.c b/mm/hmm.c > index 180e398170b0..cfad65f6a67b 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -118,15 +118,6 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk, > /* We aren't ask to do anything ... */ > if (!(pfns & range->flags[HMM_PFN_VALID])) > return; > - /* If this is device memory then only fault if explicitly requested */ > - if ((cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) { > - /* Do we fault on device memory ? */ > - if (pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) { > - *write_fault = pfns & range->flags[HMM_PFN_WRITE]; > - *fault = true; > - } > - return; > - } > > /* If CPU page table is not valid then we need to fault */ > *fault = !(cpu_flags & range->flags[HMM_PFN_VALID]); > @@ -260,21 +251,15 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, > swp_entry_t entry = pte_to_swp_entry(pte); > > /* > - * This is a special swap entry, ignore migration, use > - * device and report anything else as error. > + * Never fault in device private pages pages, but just report > + * the PFN even if not present. > */ > if (is_device_private_entry(entry)) { > - cpu_flags = range->flags[HMM_PFN_VALID] | > - range->flags[HMM_PFN_DEVICE_PRIVATE]; > - cpu_flags |= is_write_device_private_entry(entry) ? > - range->flags[HMM_PFN_WRITE] : 0; > - hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, > - &fault, &write_fault); > - if (fault || write_fault) > - goto fault; > *pfn = hmm_device_entry_from_pfn(range, > swp_offset(entry)); > - *pfn |= cpu_flags; > + *pfn |= range->flags[HMM_PFN_VALID]; > + if (is_write_device_private_entry(entry)) > + *pfn |= range->flags[HMM_PFN_WRITE]; > return 0; > } > >
Ralph Campbell
2020-Mar-16 23:11 UTC
[Nouveau] [PATCH 4/4] mm: check the device private page owner in hmm_range_fault
On 3/16/20 12:32 PM, Christoph Hellwig wrote:> Hmm range fault will succeed for any kind of device private memory, > even if it doesn't belong to the calling entity. While nouveau > has some crude checks for that, they are broken because they assume > nouveau is the only user of device private memory. Fix this by > passing in an expected pgmap owner in the hmm_range_fault structure. > > Signed-off-by: Christoph Hellwig <hch at lst.de> > Fixes: 4ef589dc9b10 ("mm/hmm/devmem: device memory hotplug using ZONE_DEVICE")Looks good. Reviewed-by: Ralph Campbell <rcampbell at nvidia.com>> --- > drivers/gpu/drm/nouveau/nouveau_dmem.c | 12 ------------ > include/linux/hmm.h | 2 ++ > mm/hmm.c | 10 +++++++++- > 3 files changed, 11 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > index edfd0805fba4..ad89e09a0be3 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > @@ -672,12 +672,6 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm, > return ret; > } > > -static inline bool > -nouveau_dmem_page(struct nouveau_drm *drm, struct page *page) > -{ > - return is_device_private_page(page) && drm->dmem == page_to_dmem(page); > -} > - > void > nouveau_dmem_convert_pfn(struct nouveau_drm *drm, > struct hmm_range *range) > @@ -696,12 +690,6 @@ nouveau_dmem_convert_pfn(struct nouveau_drm *drm, > if (!is_device_private_page(page)) > continue; > > - if (!nouveau_dmem_page(drm, page)) { > - WARN(1, "Some unknown device memory !\n"); > - range->pfns[i] = 0; > - continue; > - } > - > addr = nouveau_dmem_page_addr(page); > range->pfns[i] &= ((1UL << range->pfn_shift) - 1); > range->pfns[i] |= (addr >> PAGE_SHIFT) << range->pfn_shift; > diff --git a/include/linux/hmm.h b/include/linux/hmm.h > index 5e6034f105c3..bb6be4428633 100644 > --- a/include/linux/hmm.h > +++ b/include/linux/hmm.h > @@ -132,6 +132,7 @@ enum hmm_pfn_value_e { > * @pfn_flags_mask: allows to mask pfn flags so that only default_flags matter > * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT) > * @valid: pfns array did not change since it has been fill by an HMM function > + * @dev_private_owner: owner of device private pages > */ > struct hmm_range { > struct mmu_interval_notifier *notifier; > @@ -144,6 +145,7 @@ struct hmm_range { > uint64_t default_flags; > uint64_t pfn_flags_mask; > uint8_t pfn_shift; > + void *dev_private_owner; > }; > > /* > diff --git a/mm/hmm.c b/mm/hmm.c > index cfad65f6a67b..b75b3750e03d 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -216,6 +216,14 @@ int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr, > unsigned long end, uint64_t *pfns, pmd_t pmd); > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > +static inline bool hmm_is_device_private_entry(struct hmm_range *range, > + swp_entry_t entry) > +{ > + return is_device_private_entry(entry) && > + device_private_entry_to_page(entry)->pgmap->owner => + range->dev_private_owner; > +} > + > static inline uint64_t pte_to_hmm_pfn_flags(struct hmm_range *range, pte_t pte) > { > if (pte_none(pte) || !pte_present(pte) || pte_protnone(pte)) > @@ -254,7 +262,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, > * Never fault in device private pages pages, but just report > * the PFN even if not present. > */ > - if (is_device_private_entry(entry)) { > + if (hmm_is_device_private_entry(range, entry)) { > *pfn = hmm_device_entry_from_pfn(range, > swp_offset(entry)); > *pfn |= range->flags[HMM_PFN_VALID]; >
Bharata B Rao
2020-Mar-17 05:31 UTC
[Nouveau] ensure device private pages have an owner v2
On Mon, Mar 16, 2020 at 08:32:12PM +0100, Christoph Hellwig wrote:> When acting on device private mappings a driver needs to know if the > device (or other entity in case of kvmppc) actually owns this private > mapping. This series adds an owner field and converts the migrate_vma > code over to check it. I looked into doing the same for > hmm_range_fault, but as far as I can tell that code has never been > wired up to actually work for device private memory, so instead of > trying to fix some unused code the second patch just remove the code. > We can add it back once we have a working and fully tested code, and > then should pass the expected owner in the hmm_range structure.Boot-tested a pseries secure guest with this change (1/4 and 2/4 only) So Tested-by: Bharata B Rao <bharata at linux.ibm.com> for the above two patches. Regards, Bharata.
Jason Gunthorpe
2020-Mar-19 00:28 UTC
[Nouveau] ensure device private pages have an owner v2
On Mon, Mar 16, 2020 at 08:32:12PM +0100, Christoph Hellwig wrote:> When acting on device private mappings a driver needs to know if the > device (or other entity in case of kvmppc) actually owns this private > mapping. This series adds an owner field and converts the migrate_vma > code over to check it. I looked into doing the same for > hmm_range_fault, but as far as I can tell that code has never been > wired up to actually work for device private memory, so instead of > trying to fix some unused code the second patch just remove the code. > We can add it back once we have a working and fully tested code, and > then should pass the expected owner in the hmm_range structure. > > Changes since v1: > - split out the pgmap->owner addition into a separate patch > - check pgmap->owner is set for device private mappings > - rename the dev_private_owner field in struct migrate_vma to src_owner > - refuse to migrate private pages if src_owner is not set > - keep the non-fault device private handling in hmm_range_faultI'm happy enough to take this, did you have plans for a v3? Thanks, Jason
Christoph Hellwig
2020-Mar-19 07:16 UTC
[Nouveau] ensure device private pages have an owner v2
On Wed, Mar 18, 2020 at 09:28:49PM -0300, Jason Gunthorpe wrote:> > Changes since v1: > > - split out the pgmap->owner addition into a separate patch > > - check pgmap->owner is set for device private mappings > > - rename the dev_private_owner field in struct migrate_vma to src_owner > > - refuse to migrate private pages if src_owner is not set > > - keep the non-fault device private handling in hmm_range_fault > > I'm happy enough to take this, did you have plans for a v3?I think the only open question is if merging 3 and 4 might make sense. It's up to you if you want it resent that way or not.
Jason Gunthorpe
2020-Mar-20 13:41 UTC
[Nouveau] [PATCH 4/4] mm: check the device private page owner in hmm_range_fault
On Mon, Mar 16, 2020 at 08:32:16PM +0100, Christoph Hellwig wrote:> diff --git a/mm/hmm.c b/mm/hmm.c > index cfad65f6a67b..b75b3750e03d 100644 > +++ b/mm/hmm.c > @@ -216,6 +216,14 @@ int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr, > unsigned long end, uint64_t *pfns, pmd_t pmd); > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > +static inline bool hmm_is_device_private_entry(struct hmm_range *range, > + swp_entry_t entry) > +{ > + return is_device_private_entry(entry) && > + device_private_entry_to_page(entry)->pgmap->owner => + range->dev_private_owner; > +}Thinking about this some more, does the locking work out here? hmm_range_fault() runs with mmap_sem in read, and does not lock any of the page table levels. So it relies on accessing stale pte data being safe, and here we introduce for the first time a page pointer dereference and a pgmap dereference without any locking/refcounting. The get_dev_pagemap() worked on the PFN and obtained a refcount, so it created safety. Is there some tricky reason this is safe, eg a DEVICE_PRIVATE page cannot be removed from the vma without holding mmap_sem in write or something? Jason
Possibly Parallel Threads
- [PATCH 2/2] mm: remove device private page support from hmm_range_fault
- [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault
- [PATCH 2/2] mm: remove device private page support from hmm_range_fault
- [PATCH hmm 5/5] mm/hmm: remove the customizable pfn format from hmm_range_fault
- [PATCH hmm 5/5] mm/hmm: remove the customizable pfn format from hmm_range_fault