thr3ads.net - Nouveau - [Nouveau] [PATCH 0/9] Add support for SVM atomics in Nouveau [Feb 2021]

If this information is useful, please help other people find it:
Share via:

Daniel Vetter

2021-Feb-09 13:37 UTC

[Nouveau] [PATCH 0/9] Add support for SVM atomics in Nouveau

On Tue, Feb 9, 2021 at 1:57 PM Alistair Popple <apopple at nvidia.com>
wrote:>
> On Tuesday, 9 February 2021 9:27:05 PM AEDT Daniel Vetter wrote:
> > >
> > > Recent changes to pin_user_pages() prevent the creation of pinned
pages in
> > > ZONE_MOVABLE. This series allows pinned pages to be created in
> ZONE_MOVABLE
> > > as attempts to migrate may fail which would be fatal to
userspace.
> > >
> > > In this case migration of the pinned page is unnecessary as the
page can
> be
> > > unpinned at anytime by having the driver revoke atomic permission
as it
> > > does for the migrate_to_ram() callback. However a method of
calling this
> > > when memory needs to be moved has yet to be resolved so any
discussion is
> > > welcome.
> >
> > Why do we need to pin for gpu atomics? You still have the callback for
> > cpu faults, so you
> > can move the page as needed, and hence a long-term pin sounds like the
> > wrong approach.
>
> Technically a real long term unmoveable pin isn't required, because as
you say
> the page can be moved as needed at any time. However I needed some way of
> stopping the CPU page from being freed once the userspace mappings for it
had
> been removed. Obviously I could have just used get_page() but from the
> perspective of page migration the result is much the same as a pin - a page
> which can't be moved because of the extra refcount.
long term pin vs short term page reference aren't fully fleshed out.
But the rule more or less is:
- short term page reference: _must_ get released in finite time for
migration and other things, either because you have a callback, or
because it's just for direct I/O, which will complete. This means
short term pins will delay migration, but not foul it complete

- long term pin: the page cannot be moved, all migration must fail.
Also this will have an impact on COW behaviour for fork (but not sure
where those patches are, John Hubbard will know).

So I think for your use case here you want a) short term page
reference to make sure it doesn't disappear plus b) callback to make
sure migrate isn't blocked.

Breaking ZONE_MOVEABLE with either allowing long term pins or failing
migrations because you don't release your short term page reference
isn't good.
> The normal solution of registering an MMU notifier to unpin the page when
it
> needs to be moved also doesn't work as the CPU page tables now point to
the
> device-private page and hence the migration code won't call any
invalidate
> notifiers for the CPU page.
Yeah you need some other callback for migration on the page directly.
it's a bit awkward since there is one already for struct
address_space, but that's own by the address_space/page cache, not
HMM. So I think we need something else, maybe something for each
ZONE_DEVICE?
> > That would avoid all the hacking around long term pin constraints,
because
> > for real unmoveable long term pinned memory we really want to have all
> > these checks. So I think we might be missing some other callbacks to
be
> > able to move these pages, instead of abusing longterm pins for lack of
> > better tools.
>
> Yes, I would like to avoid the long term pin constraints as well if
possible I
> just haven't found a solution yet. Are you suggesting it might be
possible to
> add a callback in the page migration logic to specially deal with moving
these
> pages?
s/possible/need to type some code to address it/ I think.

But also I'm not much of an expert on this, I've only just started
learning how this all fits together coming from the gpu side. There's
a _lot_ of finesse involved.

Cheers, Daniel
>
> Thanks, Alistair
>
> > Cheers, Daniel
> >
> >
> >
> > >
> > > Alistair Popple (9):
> > >   mm/migrate.c: Always allow device private pages to migrate
> > >   mm/migrate.c: Allow pfn flags to be passed to
migrate_vma_setup()
> > >   mm/migrate: Add a unmap and pin migration mode
> > >   Documentation: Add unmap and pin to HMM
> > >   hmm-tests: Add test for unmap and pin
> > >   nouveau/dmem: Only map migrating pages
> > >   nouveau/svm: Refactor nouveau_range_fault
> > >   nouveau/dmem: Add support for multiple page types
> > >   nouveau/svm: Implement atomic SVM access
> > >
> > >  Documentation/vm/hmm.rst                      |  22 +-
> > >  arch/powerpc/kvm/book3s_hv_uvmem.c            |   4 +-
> > >  drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
> > >  drivers/gpu/drm/nouveau/nouveau_dmem.c        | 190
+++++++++++++++---
> > >  drivers/gpu/drm/nouveau/nouveau_dmem.h        |   9 +
> > >  drivers/gpu/drm/nouveau/nouveau_svm.c         | 148
+++++++++++---
> > >  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
> > >  .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   6 +
> > >  include/linux/migrate.h                       |   2 +
> > >  include/linux/migrate_mode.h                  |   1 +
> > >  lib/test_hmm.c                                | 109 ++++++++--
> > >  lib/test_hmm_uapi.h                           |   1 +
> > >  mm/migrate.c                                  |  82 +++++---
> > >  tools/testing/selftests/vm/hmm-tests.c        |  49 +++++
> > >  14 files changed, 524 insertions(+), 101 deletions(-)
> > >
> > > --
> > > 2.20.1
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
>
>
>
>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

John Hubbard

2021-Feb-09 20:53 UTC

head link

[Nouveau] [PATCH 0/9] Add support for SVM atomics in Nouveau

On 2/9/21 5:37 AM, Daniel Vetter wrote:> On Tue, Feb 9, 2021 at 1:57 PM Alistair Popple <apopple at
nvidia.com> wrote:
>>
>> On Tuesday, 9 February 2021 9:27:05 PM AEDT Daniel Vetter wrote:
>>>>
>>>> Recent changes to pin_user_pages() prevent the creation of
pinned pages in
>>>> ZONE_MOVABLE. This series allows pinned pages to be created in
>> ZONE_MOVABLE
>>>> as attempts to migrate may fail which would be fatal to
userspace.
>>>>
>>>> In this case migration of the pinned page is unnecessary as the
page can
>> be
>>>> unpinned at anytime by having the driver revoke atomic
permission as it
>>>> does for the migrate_to_ram() callback. However a method of
calling this
>>>> when memory needs to be moved has yet to be resolved so any
discussion is
>>>> welcome.
>>>
>>> Why do we need to pin for gpu atomics? You still have the callback
for
>>> cpu faults, so you
>>> can move the page as needed, and hence a long-term pin sounds like
the
>>> wrong approach.
>>
>> Technically a real long term unmoveable pin isn't required, because
as you say
>> the page can be moved as needed at any time. However I needed some way
of
>> stopping the CPU page from being freed once the userspace mappings for
it had
>> been removed. Obviously I could have just used get_page() but from the
>> perspective of page migration the result is much the same as a pin - a
page
>> which can't be moved because of the extra refcount.
> 
> long term pin vs short term page reference aren't fully fleshed out.
> But the rule more or less is:
> - short term page reference: _must_ get released in finite time for
> migration and other things, either because you have a callback, or
> because it's just for direct I/O, which will complete. This means
> short term pins will delay migration, but not foul it complete

GPU atomic operations to sysmem are hard to categorize, because because
application
programmers could easily write programs that do a long series of atomic
operations.
Such a program would be a little weird, but it's hard to rule out.

> 
> - long term pin: the page cannot be moved, all migration must fail.
> Also this will have an impact on COW behaviour for fork (but not sure
> where those patches are, John Hubbard will know).

That would be Jason's commit 57efa1fe59576 ("mm/gup: prevent gup_fast
from racing
with COW during fork"), which is in linux-next 20201216.

> 
> So I think for your use case here you want a) short term page
> reference to make sure it doesn't disappear plus b) callback to make
> sure migrate isn't blocked.
> 
> Breaking ZONE_MOVEABLE with either allowing long term pins or failing
> migrations because you don't release your short term page reference
> isn't good.
> 
>> The normal solution of registering an MMU notifier to unpin the page
when it
>> needs to be moved also doesn't work as the CPU page tables now
point to the
>> device-private page and hence the migration code won't call any
invalidate
>> notifiers for the CPU page.
> 
> Yeah you need some other callback for migration on the page directly.
> it's a bit awkward since there is one already for struct
> address_space, but that's own by the address_space/page cache, not
> HMM. So I think we need something else, maybe something for each
> ZONE_DEVICE?
> 
This direction sounds at least...possible. Using MMU notifiers instead of pins
is definitely appealing. I'm not quite clear on the callback idea above, but
overall it seems like taking advantage of the ZONE_DEVICE tracking of pages
(without having to put anything additional in each struct page), could work.

Additional notes or ideas here are definitely welcome.



thanks,
-- 
John Hubbard
NVIDIA

Nouveau - Feb 2021 - [PATCH 0/9] Add support for SVM atomics in Nouveau

[Nouveau] [PATCH 0/9] Add support for SVM atomics in Nouveau

[Nouveau] [PATCH 0/9] Add support for SVM atomics in Nouveau