thr3ads.net - Nouveau - [RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages [Feb 2025]

If this information is useful, please help other people find it:
Share via:

Jason Gunthorpe

2025-Jan-30 17:42 UTC

[RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages

On Thu, Jan 30, 2025 at 05:09:44PM +0100, Simona Vetter wrote:
> > You could also use an integer instead of a pointer to indicate the
> > cluster of interconnect, I think there are many options..
> 
> Hm yeah I guess an integer allocater of the atomic_inc kind plus
"surely
> 32bit is enough" could work. But I don't think it's needed, if
we can
> reliable just unregister the entire dev_pagemap and then just set up a new
> one. Plus that avoids thinking about which barriers we might need where
> exactly all over mm code that looks at the owner field.
IMHO that is the best answer if it works for the driver.> > ? It is supposed to work, it blocks until all the pages are freed, but
> > AFAIK ther is no fundamental life time issue. The driver is
> > responsible to free all its usage.
> 
> Hm I looked at it again, and I guess with the fixes to make migration to
> system memory work reliable in Matt Brost's latest series it should
indeed
> work reliable. The devm_ version still freaks me out because of how easily
> people misuse these for things that are memory allocations.
I also don't like the devm stuff, especially in costly places like
this. Oh well.
> > > An optional callback is a lot less scary to me here (or redoing
> > > hmm_range_fault or whacking the migration helpers a few times)
looks a lot
> > > less scary than making pgmap->owner mutable in some fashion.
> > 
> > It extra for every single 4k page on every user :\
> > 
> > And what are you going to do better inside this callback?
> 
> Having more comfy illusions :-P
Exactly!
> Slightly more seriously, I can grab some locks and make life easier,
Yes, but then see my concern about performance again. Now you are
locking/unlocking every 4k? And then it still races since it can
change after hmm_range_fault returns. That's not small, and not really
better.
> whereas sprinkling locking or even barriers over pgmap->owner in core mm
> is not going to fly. Plus more flexibility, e.g. when the interconnect
> doesn't work for atomics or some other funny reason it only works for
some
> of the traffic, where you need to more dynamically decide where memory is
> ok to sit.
Sure, an asymmetric mess could be problematic, and we might need more
later, but lets get to that first..
> Or checking p2pdma connectivity and all that stuff.
Should be done in the core code, don't want drivers open coding this
stuff.
> Also note that fundamentally you can't protect against the hotunplug or
> driver unload case for hardware access. So some access will go to nowhere
> when that happens, until we've torn down all the mappings and migrated
> memory out.
I think a literal (safe!) hot unplug must always use the page map
revoke, and that should be safely locked between hmm_range_fault and
the notifier.

If the underlying fabric is loosing operations during an unplanned hot
unplug I expect things will need resets to recover..

Jason

Simona Vetter

2025-Jan-31 16:59 UTC

head link

[RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages

On Thu, Jan 30, 2025 at 01:42:17PM -0400, Jason Gunthorpe
wrote:> On Thu, Jan 30, 2025 at 05:09:44PM +0100, Simona Vetter wrote:
> > > > An optional callback is a lot less scary to me here (or
redoing
> > > > hmm_range_fault or whacking the migration helpers a few
times) looks a lot
> > > > less scary than making pgmap->owner mutable in some
fashion.
> > > 
> > > It extra for every single 4k page on every user :\
> > > 
> > > And what are you going to do better inside this callback?
> > 
> > Having more comfy illusions :-P
> 
> Exactly!
> 
> > Slightly more seriously, I can grab some locks and make life easier,
> 
> Yes, but then see my concern about performance again. Now you are
> locking/unlocking every 4k? And then it still races since it can
> change after hmm_range_fault returns. That's not small, and not really
> better.
Hm yeah, I think that's the death argument for the callback. Consider me
convinced on that being a bad idea.
> > whereas sprinkling locking or even barriers over pgmap->owner in
core mm
> > is not going to fly. Plus more flexibility, e.g. when the interconnect
> > doesn't work for atomics or some other funny reason it only works
for some
> > of the traffic, where you need to more dynamically decide where memory
is
> > ok to sit.
> 
> Sure, an asymmetric mess could be problematic, and we might need more
> later, but lets get to that first..
> 
> > Or checking p2pdma connectivity and all that stuff.
> 
> Should be done in the core code, don't want drivers open coding this
> stuff.
Yeah so after amdkfd and noveau I agree that letting drivers mess this up
isn't great. But see below, I'm not sure whether going all the way to
core
code is the right approach, at least for gpu internal needs.
> > Also note that fundamentally you can't protect against the
hotunplug or
> > driver unload case for hardware access. So some access will go to
nowhere
> > when that happens, until we've torn down all the mappings and
migrated
> > memory out.
> 
> I think a literal (safe!) hot unplug must always use the page map
> revoke, and that should be safely locked between hmm_range_fault and
> the notifier.
> 
> If the underlying fabric is loosing operations during an unplanned hot
> unplug I expect things will need resets to recover..
So one aspect where I don't like the pgmap->owner approach much is that
it's a big thing to get right, and it feels a bit to me that we don't
yet
know the right questions.

A bit related is that we'll have to do some driver-specific migration
after hmm_range_fault anyway for allocation policies. With coherent
interconnect that'd be up to numactl, but for driver private it's all up
to the driver. And once we have that, we can also migrate memory around
that's misplaced for functional and not just performance reasons.

The plan I discussed with Thomas a while back at least for gpus was to
have that as a drm_devpagemap library, which would have a common owner (or
maybe per driver or so as Thomas suggested). Then it would still not be in
drivers, but also a bit easier to mess around with for experiments. And
once we have some clear things that hmm_range_fault should do instead for
everyone, we can lift them up.

Doing this at a pagemap level should also be much more efficient, since I
think we can make the assumption that access limitations are uniform for a
given dev_pagemap (and if they're not if e.g. not the entire vram is bus
visible, drivers can handle that by splitting things up).

But upfront speccing all this out doesn't seem like a good idea to,
because I honestly don't know what we all need.

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Jason Gunthorpe

2025-Feb-03 15:08 UTC

head link

[RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages

On Fri, Jan 31, 2025 at 05:59:26PM +0100, Simona Vetter wrote:
> So one aspect where I don't like the pgmap->owner approach much is
that
> it's a big thing to get right, and it feels a bit to me that we
don't yet
> know the right questions.
Well, I would say it isn't really complete yet. No driver has yet
attempted to use a private interconnect with these scheme. Probably it
needs more work.
> A bit related is that we'll have to do some driver-specific migration
> after hmm_range_fault anyway for allocation policies. With coherent
> interconnect that'd be up to numactl, but for driver private it's
all up
> to the driver. And once we have that, we can also migrate memory around
> that's misplaced for functional and not just performance reasons.
Are you sure? This doesn't seem to what any hmm_range_fault() user
should be doing. hmm_range_fault() is to help mirror the page table
to a secondary, that is all. Migration policy shouldn't be part of it,
just mirroring doesn't necessarily mean any access was performed, for
instance.

And mirroring doesn't track any access done by non-faulting cases either.
> The plan I discussed with Thomas a while back at least for gpus was to
> have that as a drm_devpagemap library, 
I would not be happy to see this. Please improve pagemap directly if
you think you need more things.
> which would have a common owner (or
> maybe per driver or so as Thomas suggested). 
Neither really match the expected design here. The owner should be
entirely based on reachability. Devices that cannot reach each other
directly should have different owners.
> But upfront speccing all this out doesn't seem like a good idea to,
> because I honestly don't know what we all need.
This is why it is currently just void *owner  :)

Jason

Reasonably Related Threads

Search for more reasonably related threads

Nouveau - Feb 2025 - [RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages

[RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages

[RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages

[RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages

Reasonably Related Threads