thr3ads.net - Nouveau - [Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma

If this information is useful, please help other people find it:
Share via:

Dan Williams

2019-Aug-15 20:12 UTC

[Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

On Thu, Aug 15, 2019 at 12:44 PM Jerome Glisse <jglisse at redhat.com>
wrote:>
> On Thu, Aug 15, 2019 at 12:36:58PM -0700, Dan Williams wrote:
> > On Thu, Aug 15, 2019 at 11:07 AM Jerome Glisse <jglisse at
redhat.com> wrote:
> > >
> > > On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > > > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe <jgg at
mellanox.com> wrote:
> > > > >
> > > > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph
Hellwig wrote:
> > > > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan
Williams wrote:
> > > > > > > Section alignment constraints somewhat save
us here. The only example
> > > > > > > I can think of a PMD not containing a uniform
pgmap association for
> > > > > > > each pte is the case when the pgmap overlaps
normal dram, i.e. shares
> > > > > > > the same 'struct memory_section' for
a given span. Otherwise, distinct
> > > > > > > pgmaps arrange to manage their own exclusive
sections (and now
> > > > > > > subsections as of v5.3). Otherwise the
implementation could not
> > > > > > > guarantee different mapping lifetimes.
> > > > > > >
> > > > > > > That said, this seems to want a better
mechanism to determine "pfn is
> > > > > > > ZONE_DEVICE".
> > > > > >
> > > > > > So I guess this patch is fine for now, and once
you provide a better
> > > > > > mechanism we can switch over to it?
> > > > >
> > > > > What about the version I sent to just get rid of all
the strange
> > > > > put_dev_pagemaps while scanning? Odds are good we will
work with only
> > > > > a single pagemap, so it makes some sense to cache it
once we find it?
> > > >
> > > > Yes, if the scan is over a single pmd then caching it makes
sense.
> > >
> > > Quite frankly an easier an better solution is to remove the
pagemap
> > > lookup as HMM user abide by mmu notifier it means we will not
make
> > > use or dereference the struct page so that we are safe from any
> > > racing hotunplug of dax memory (as long as device driver using
hmm
> > > do not have a bug).
> >
> > Yes, as long as the driver remove is synchronized against HMM
> > operations via another mechanism then there is no need to take pagemap
> > references. Can you briefly describe what that other mechanism is?
>
> So if you hotunplug some dax memory i assume that this can only
> happens once all the pages are unmapped (as it must have the
> zero refcount, well 1 because of the bias) and any unmap will
> trigger a mmu notifier callback. User of hmm mirror abiding by
> the API will never make use of information they get through the
> fault or snapshot function until checking for racing notifier
> under lock.
Hmm that first assumption is not guaranteed by the dev_pagemap core.
The dev_pagemap end of life model is "disable, invalidate, drain" so
it's possible to call devm_munmap_pages() while pages are still mapped
it just won't complete the teardown of the pagemap until the last
reference is dropped. New references are blocked during this teardown.

However, if the driver is validating the liveness of the mapping in
the mmu-notifier path and blocking new references it sounds like it
should be ok. Might there be GPU driver unit tests that cover this
racing teardown case?

Jerome Glisse

2019-Aug-15 20:33 UTC

head link

[Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

On Thu, Aug 15, 2019 at 01:12:22PM -0700, Dan Williams
wrote:> On Thu, Aug 15, 2019 at 12:44 PM Jerome Glisse <jglisse at
redhat.com> wrote:
> >
> > On Thu, Aug 15, 2019 at 12:36:58PM -0700, Dan Williams wrote:
> > > On Thu, Aug 15, 2019 at 11:07 AM Jerome Glisse <jglisse at
redhat.com> wrote:
> > > >
> > > > On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams
wrote:
> > > > > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe <jgg
at mellanox.com> wrote:
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 09:38:54AM +0200,
Christoph Hellwig wrote:
> > > > > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan
Williams wrote:
> > > > > > > > Section alignment constraints somewhat
save us here. The only example
> > > > > > > > I can think of a PMD not containing a
uniform pgmap association for
> > > > > > > > each pte is the case when the pgmap
overlaps normal dram, i.e. shares
> > > > > > > > the same 'struct memory_section'
for a given span. Otherwise, distinct
> > > > > > > > pgmaps arrange to manage their own
exclusive sections (and now
> > > > > > > > subsections as of v5.3). Otherwise the
implementation could not
> > > > > > > > guarantee different mapping lifetimes.
> > > > > > > >
> > > > > > > > That said, this seems to want a better
mechanism to determine "pfn is
> > > > > > > > ZONE_DEVICE".
> > > > > > >
> > > > > > > So I guess this patch is fine for now, and
once you provide a better
> > > > > > > mechanism we can switch over to it?
> > > > > >
> > > > > > What about the version I sent to just get rid of
all the strange
> > > > > > put_dev_pagemaps while scanning? Odds are good we
will work with only
> > > > > > a single pagemap, so it makes some sense to cache
it once we find it?
> > > > >
> > > > > Yes, if the scan is over a single pmd then caching it
makes sense.
> > > >
> > > > Quite frankly an easier an better solution is to remove the
pagemap
> > > > lookup as HMM user abide by mmu notifier it means we will
not make
> > > > use or dereference the struct page so that we are safe from
any
> > > > racing hotunplug of dax memory (as long as device driver
using hmm
> > > > do not have a bug).
> > >
> > > Yes, as long as the driver remove is synchronized against HMM
> > > operations via another mechanism then there is no need to take
pagemap
> > > references. Can you briefly describe what that other mechanism
is?
> >
> > So if you hotunplug some dax memory i assume that this can only
> > happens once all the pages are unmapped (as it must have the
> > zero refcount, well 1 because of the bias) and any unmap will
> > trigger a mmu notifier callback. User of hmm mirror abiding by
> > the API will never make use of information they get through the
> > fault or snapshot function until checking for racing notifier
> > under lock.
> 
> Hmm that first assumption is not guaranteed by the dev_pagemap core.
> The dev_pagemap end of life model is "disable, invalidate, drain"
so
> it's possible to call devm_munmap_pages() while pages are still mapped
> it just won't complete the teardown of the pagemap until the last
> reference is dropped. New references are blocked during this teardown.
> 
> However, if the driver is validating the liveness of the mapping in
> the mmu-notifier path and blocking new references it sounds like it
> should be ok. Might there be GPU driver unit tests that cover this
> racing teardown case?
So nor HMM nor driver should dereference the struct page (i do not
think any iommu driver would either), they only care about the pfn.
So even if we race with a teardown as soon as we get the mmu notifier
callback to invalidate the mmapping we will do so. The pattern is:

    mydriver_populate_vaddr_range(start, end) {
        hmm_range_register(range, start, end)
    again:
        ret = hmm_range_fault(start, end)
        if (ret < 0)
            return ret;

        take_driver_page_table_lock();
        if (range.valid) {
            populate_device_page_table();
            release_driver_page_table_lock();
        } else {
            release_driver_page_table_lock();
            goto again;
        }
    }

The mmu notifier callback do use the same page table lock and we
also have the range tracking going on. So either we populate
device page table before racing with teardown in which case the
device page table entry are clear through the mmu notifier call
back. Or if we race, but then we can see the racing mmu notifier
calls and retry again which will trigger a regular page fault
which will return an error i assume.

So in the end we have the exact same behavior as if a CPU was trying
to access that virtual address. This is the whole point of HMM, to
behave exactly as if it was a CPU access. Fails in the same way,
race in the same way. So if DAX teardown are safe versus racing
CPU access to some vma that have that memory map, it will be the
same for HMM users.


GPU driver test suite are not good at testing this. They are geared
to test the GPU itself not the interaction of the GPU driver with
rest of the kernel.

Cheers,
Jérôme

Jason Gunthorpe

2019-Aug-15 20:41 UTC

head link

[Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
> So nor HMM nor driver should dereference the struct page (i do not
> think any iommu driver would either),
Er, they do technically deref the struct page:

nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
			 struct hmm_range *range)
		struct page *page;
		page = hmm_pfn_to_page(range, range->pfns[i]);
		if (!nouveau_dmem_page(drm, page)) {


nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
{
	return is_device_private_page(page) && drm->dmem ==
page_to_dmem(page)


Which does touch 'page->pgmap'

Is this OK without having a get_dev_pagemap() ?

Noting that the collision-retry scheme doesn't protect anything here
as we can have a concurrent invalidation while doing the above deref.

Jason

Christoph Hellwig

2019-Aug-16 04:41 UTC

head link

[Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse
wrote:> So nor HMM nor driver should dereference the struct page (i do not
> think any iommu driver would either),
Both current hmm drivers convert the hmm pfn back to a page and
eventually call dma_map_page on it.  As do the ODP patches from you.

Possibly Parallel Threads

Search for more seemingly similar threads

Nouveau - Aug 2019 - [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

[Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

[Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

[Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

[Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

Possibly Parallel Threads