Jason Gunthorpe
2025-Jan-30 17:42 UTC
[RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages
On Thu, Jan 30, 2025 at 05:09:44PM +0100, Simona Vetter wrote:> > You could also use an integer instead of a pointer to indicate the > > cluster of interconnect, I think there are many options.. > > Hm yeah I guess an integer allocater of the atomic_inc kind plus "surely > 32bit is enough" could work. But I don't think it's needed, if we can > reliable just unregister the entire dev_pagemap and then just set up a new > one. Plus that avoids thinking about which barriers we might need where > exactly all over mm code that looks at the owner field.IMHO that is the best answer if it works for the driver.> > ? It is supposed to work, it blocks until all the pages are freed, but > > AFAIK ther is no fundamental life time issue. The driver is > > responsible to free all its usage. > > Hm I looked at it again, and I guess with the fixes to make migration to > system memory work reliable in Matt Brost's latest series it should indeed > work reliable. The devm_ version still freaks me out because of how easily > people misuse these for things that are memory allocations.I also don't like the devm stuff, especially in costly places like this. Oh well.> > > An optional callback is a lot less scary to me here (or redoing > > > hmm_range_fault or whacking the migration helpers a few times) looks a lot > > > less scary than making pgmap->owner mutable in some fashion. > > > > It extra for every single 4k page on every user :\ > > > > And what are you going to do better inside this callback? > > Having more comfy illusions :-PExactly!> Slightly more seriously, I can grab some locks and make life easier,Yes, but then see my concern about performance again. Now you are locking/unlocking every 4k? And then it still races since it can change after hmm_range_fault returns. That's not small, and not really better.> whereas sprinkling locking or even barriers over pgmap->owner in core mm > is not going to fly. Plus more flexibility, e.g. when the interconnect > doesn't work for atomics or some other funny reason it only works for some > of the traffic, where you need to more dynamically decide where memory is > ok to sit.Sure, an asymmetric mess could be problematic, and we might need more later, but lets get to that first..> Or checking p2pdma connectivity and all that stuff.Should be done in the core code, don't want drivers open coding this stuff.> Also note that fundamentally you can't protect against the hotunplug or > driver unload case for hardware access. So some access will go to nowhere > when that happens, until we've torn down all the mappings and migrated > memory out.I think a literal (safe!) hot unplug must always use the page map revoke, and that should be safely locked between hmm_range_fault and the notifier. If the underlying fabric is loosing operations during an unplanned hot unplug I expect things will need resets to recover.. Jason
Simona Vetter
2025-Jan-31 16:59 UTC
[RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages
On Thu, Jan 30, 2025 at 01:42:17PM -0400, Jason Gunthorpe wrote:> On Thu, Jan 30, 2025 at 05:09:44PM +0100, Simona Vetter wrote: > > > > An optional callback is a lot less scary to me here (or redoing > > > > hmm_range_fault or whacking the migration helpers a few times) looks a lot > > > > less scary than making pgmap->owner mutable in some fashion. > > > > > > It extra for every single 4k page on every user :\ > > > > > > And what are you going to do better inside this callback? > > > > Having more comfy illusions :-P > > Exactly! > > > Slightly more seriously, I can grab some locks and make life easier, > > Yes, but then see my concern about performance again. Now you are > locking/unlocking every 4k? And then it still races since it can > change after hmm_range_fault returns. That's not small, and not really > better.Hm yeah, I think that's the death argument for the callback. Consider me convinced on that being a bad idea.> > whereas sprinkling locking or even barriers over pgmap->owner in core mm > > is not going to fly. Plus more flexibility, e.g. when the interconnect > > doesn't work for atomics or some other funny reason it only works for some > > of the traffic, where you need to more dynamically decide where memory is > > ok to sit. > > Sure, an asymmetric mess could be problematic, and we might need more > later, but lets get to that first.. > > > Or checking p2pdma connectivity and all that stuff. > > Should be done in the core code, don't want drivers open coding this > stuff.Yeah so after amdkfd and noveau I agree that letting drivers mess this up isn't great. But see below, I'm not sure whether going all the way to core code is the right approach, at least for gpu internal needs.> > Also note that fundamentally you can't protect against the hotunplug or > > driver unload case for hardware access. So some access will go to nowhere > > when that happens, until we've torn down all the mappings and migrated > > memory out. > > I think a literal (safe!) hot unplug must always use the page map > revoke, and that should be safely locked between hmm_range_fault and > the notifier. > > If the underlying fabric is loosing operations during an unplanned hot > unplug I expect things will need resets to recover..So one aspect where I don't like the pgmap->owner approach much is that it's a big thing to get right, and it feels a bit to me that we don't yet know the right questions. A bit related is that we'll have to do some driver-specific migration after hmm_range_fault anyway for allocation policies. With coherent interconnect that'd be up to numactl, but for driver private it's all up to the driver. And once we have that, we can also migrate memory around that's misplaced for functional and not just performance reasons. The plan I discussed with Thomas a while back at least for gpus was to have that as a drm_devpagemap library, which would have a common owner (or maybe per driver or so as Thomas suggested). Then it would still not be in drivers, but also a bit easier to mess around with for experiments. And once we have some clear things that hmm_range_fault should do instead for everyone, we can lift them up. Doing this at a pagemap level should also be much more efficient, since I think we can make the assumption that access limitations are uniform for a given dev_pagemap (and if they're not if e.g. not the entire vram is bus visible, drivers can handle that by splitting things up). But upfront speccing all this out doesn't seem like a good idea to, because I honestly don't know what we all need. Cheers, Sima -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Jason Gunthorpe
2025-Feb-03 15:08 UTC
[RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages
On Fri, Jan 31, 2025 at 05:59:26PM +0100, Simona Vetter wrote:> So one aspect where I don't like the pgmap->owner approach much is that > it's a big thing to get right, and it feels a bit to me that we don't yet > know the right questions.Well, I would say it isn't really complete yet. No driver has yet attempted to use a private interconnect with these scheme. Probably it needs more work.> A bit related is that we'll have to do some driver-specific migration > after hmm_range_fault anyway for allocation policies. With coherent > interconnect that'd be up to numactl, but for driver private it's all up > to the driver. And once we have that, we can also migrate memory around > that's misplaced for functional and not just performance reasons.Are you sure? This doesn't seem to what any hmm_range_fault() user should be doing. hmm_range_fault() is to help mirror the page table to a secondary, that is all. Migration policy shouldn't be part of it, just mirroring doesn't necessarily mean any access was performed, for instance. And mirroring doesn't track any access done by non-faulting cases either.> The plan I discussed with Thomas a while back at least for gpus was to > have that as a drm_devpagemap library,I would not be happy to see this. Please improve pagemap directly if you think you need more things.> which would have a common owner (or > maybe per driver or so as Thomas suggested).Neither really match the expected design here. The owner should be entirely based on reachability. Devices that cannot reach each other directly should have different owners.> But upfront speccing all this out doesn't seem like a good idea to, > because I honestly don't know what we all need.This is why it is currently just void *owner :) Jason
Apparently Analagous Threads
- [PATCH v1 0/4] GPU Direct RDMA (P2P DMA) for Device Private Pages
- [RFC 0/5] GPU Direct RDMA (P2P DMA) for Device Private Pages
- [PATCH 2/2] mm: remove device private page support from hmm_range_fault
- [PATCH hmm 2/5] mm/hmm: make hmm_range_fault return 0 or -1
- [PATCH hmm v2 2/5] mm/hmm: make hmm_range_fault return 0 or -1