David Hildenbrand
2025-Jan-30 09:37 UTC
[PATCH v1 4/4] mm/memory: document restore_exclusive_pte()
On 30.01.25 01:27, Alistair Popple wrote:> On Wed, Jan 29, 2025 at 12:58:02PM +0100, David Hildenbrand wrote: >> Let's document how this function is to be used, and why the requirement >> for the folio lock might maybe be dropped in the future. > > Sorry, only just catching up on your other thread. The folio lock was to ensure > the GPU got a chance to make forward progress by mapping the page. Without it > the CPU could immediately invalidate the entry before the GPU had a chance to > retry the fault.> > Obviously performance wise having such thrashing is terrible, so should> really be avoided by userspace, but the lock at least allowed such programs > to complete.Thanks for the clarification. So it's relevant that the MMU notifier in remove_device_exclusive_entry() is sent after taking the folio lock. However, as soon as we drop the folio lock, remove_device_exclusive_entry() will become active, lock the folio and trigger the MMU notifier. So the time it is actually mapped into the device is rather> >> Signed-off-by: David Hildenbrand <david at redhat.com> >> --- >> mm/memory.c | 25 +++++++++++++++++++++++++ >> 1 file changed, 25 insertions(+) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index 46956994aaff..caaae8df11a9 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -718,6 +718,31 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma, >> } >> #endif >> >> +/** >> + * restore_exclusive_pte - Restore a device-exclusive entry >> + * @vma: VMA covering @address >> + * @folio: the mapped folio >> + * @page: the mapped folio page >> + * @address: the virtual address >> + * @ptep: PTE pointer into the locked page table mapping the folio page >> + * @orig_pte: PTE value at @ptep >> + * >> + * Restore a device-exclusive non-swap entry to an ordinary present PTE. >> + * >> + * The folio and the page table must be locked, and MMU notifiers must have >> + * been called to invalidate any (exclusive) device mappings. In case of >> + * fork(), MMU_NOTIFY_PROTECTION_PAGE is triggered, and in case of a page >> + * fault MMU_NOTIFY_EXCLUSIVE is triggered. >> + * >> + * Locking the folio makes sure that anybody who just converted the PTE to >> + * a device-private entry can map it into the device, before unlocking it; so >> + * the folio lock prevents concurrent conversion to device-exclusive. > > I don't quite follow this - a concurrent conversion would already fail > because the GUP in make_device_exclusive_range() would most likely cause > an unexpected reference during the migration. And if a migration entry > has already been installed for the device private PTE conversion then > make_device_exclusive_range() will skip it as a non-present entry anyway.Sorry, I meant "device-exclusive", so migration is not a concern.> > However s/device-private/device-exclusive/ makes sense - the intent was to allow > the device to map it before a call to restore_exclusive_pte() (ie. a CPU fault) > could convert it back to a normal PTE. > >> + * TODO: the folio lock does not protect against all cases of concurrent >> + * page table modifications (e.g., MADV_DONTNEED, mprotect), so device drivers >> + * must already use MMU notifiers to sync against any concurrent changes > > Right. It's expected drivers are using MMU notifiers to keep page tables in > sync, same as for hmm_range_fault().Let me try to rephrase it given that the folio lock is purely to guarantee forward-progress, not for correctness; that's what MMU notifiers must be used for. -- Cheers, David / dhildenb
Simona Vetter
2025-Jan-30 13:31 UTC
[PATCH v1 4/4] mm/memory: document restore_exclusive_pte()
On Thu, Jan 30, 2025 at 10:37:06AM +0100, David Hildenbrand wrote:> On 30.01.25 01:27, Alistair Popple wrote: > > On Wed, Jan 29, 2025 at 12:58:02PM +0100, David Hildenbrand wrote: > > > Let's document how this function is to be used, and why the requirement > > > for the folio lock might maybe be dropped in the future. > > > > Sorry, only just catching up on your other thread. The folio lock was to ensure > > the GPU got a chance to make forward progress by mapping the page. Without it > > the CPU could immediately invalidate the entry before the GPU had a chance to > > retry the fault. > > > Obviously performance wise having such thrashing is terrible, so should > > really be avoided by userspace, but the lock at least allowed such programs > > to complete. > > Thanks for the clarification. So it's relevant that the MMU notifier in > remove_device_exclusive_entry() is sent after taking the folio lock. > > However, as soon as we drop the folio lock, remove_device_exclusive_entry() > will become active, lock the folio and trigger the MMU notifier. > > So the time it is actually mapped into the device is ratherLooks like you cut off a bit here (or mail transport did that somewhere), but see my other reply I don't think this is a legit use-case. So we don't have to worry. Well beyond documenting that if userspace concurrently thrashes the same page with both device atomics and cpu access it will stall real bad. -Sima -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
David Hildenbrand
2025-Jan-30 15:29 UTC
[PATCH v1 4/4] mm/memory: document restore_exclusive_pte()
On 30.01.25 14:31, Simona Vetter wrote:> On Thu, Jan 30, 2025 at 10:37:06AM +0100, David Hildenbrand wrote: >> On 30.01.25 01:27, Alistair Popple wrote: >>> On Wed, Jan 29, 2025 at 12:58:02PM +0100, David Hildenbrand wrote: >>>> Let's document how this function is to be used, and why the requirement >>>> for the folio lock might maybe be dropped in the future. >>> >>> Sorry, only just catching up on your other thread. The folio lock was to ensure >>> the GPU got a chance to make forward progress by mapping the page. Without it >>> the CPU could immediately invalidate the entry before the GPU had a chance to >>> retry the fault. >>>> Obviously performance wise having such thrashing is terrible, so should >>> really be avoided by userspace, but the lock at least allowed such programs >>> to complete. >> >> Thanks for the clarification. So it's relevant that the MMU notifier in >> remove_device_exclusive_entry() is sent after taking the folio lock. >> >> However, as soon as we drop the folio lock, remove_device_exclusive_entry() >> will become active, lock the folio and trigger the MMU notifier. >> >> So the time it is actually mapped into the device is ratherI meant to say "rather short." :)> > Looks like you cut off a bit here (or mail transport did that somewhere), > but see my other reply I don't think this is a legit use-case. So we don't > have to worry.In that case, we would need the folio lock in the future.> Well beyond documenting that if userspace concurrently thrashes > the same page with both device atomics and cpu access it will stall real > bad.I'm curious, is locking between device-cpu or device-device something that can happen frequently? In that case, you would get that trashing naturally? -- Cheers, David / dhildenb