David Hildenbrand
2025-Jan-29 11:54 UTC
[PATCH v1 05/12] mm/memory: detect writability in restore_exclusive_pte() through can_change_pte_writable()
Let's do it just like mprotect write-upgrade or during NUMA-hinting faults on PROT_NONE PTEs: detect if the PTE can be writable by using can_change_pte_writable(). Set the PTE only dirty if the folio is dirty: we might not necessarily have a write access, and setting the PTE writable doesn't require setting the PTE dirty. With this change in place, there is no need to have separate readable and writable device-exclusive entry types, and we'll merge them next separately. Note that, during fork(), we first convert the device-exclusive entries back to ordinary PTEs, and we only ever allow conversion of writable PTEs to device-exclusive -- only mprotect can currently change them to readable-device-exclusive. Consequently, we always expect PageAnonExclusive(page)==true and can_change_pte_writable()==true, unless we are dealing with soft-dirty tracking or uffd-wp. But reusing can_change_pte_writable() for now is cleaner. Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/memory.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 03efeeef895a..db38d6ae4e74 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -725,18 +725,21 @@ static void restore_exclusive_pte(struct vm_area_struct *vma, struct folio *folio = page_folio(page); pte_t orig_pte; pte_t pte; - swp_entry_t entry; orig_pte = ptep_get(ptep); pte = pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot))); if (pte_swp_soft_dirty(orig_pte)) pte = pte_mksoft_dirty(pte); - entry = pte_to_swp_entry(orig_pte); if (pte_swp_uffd_wp(orig_pte)) pte = pte_mkuffd_wp(pte); - else if (is_writable_device_exclusive_entry(entry)) - pte = maybe_mkwrite(pte_mkdirty(pte), vma); + + if ((vma->vm_flags & VM_WRITE) && + can_change_pte_writable(vma, address, pte)) { + if (folio_test_dirty(folio)) + pte = pte_mkdirty(pte); + pte = pte_mkwrite(pte, vma); + } VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) && PageAnonExclusive(page)), folio); -- 2.48.1
Simona Vetter
2025-Jan-30 09:51 UTC
[PATCH v1 05/12] mm/memory: detect writability in restore_exclusive_pte() through can_change_pte_writable()
On Wed, Jan 29, 2025 at 12:54:03PM +0100, David Hildenbrand wrote:> Let's do it just like mprotect write-upgrade or during NUMA-hinting > faults on PROT_NONE PTEs: detect if the PTE can be writable by using > can_change_pte_writable(). > > Set the PTE only dirty if the folio is dirty: we might not > necessarily have a write access, and setting the PTE writable doesn't > require setting the PTE dirty.Not sure whether there's much difference in practice, since a device exclusive access means a write, so the folio better be dirty (unless we aborted halfway through). But then I couldn't find the code in nouveau to do that, so now I'm confused. -Sima> With this change in place, there is no need to have separate > readable and writable device-exclusive entry types, and we'll merge > them next separately. > > Note that, during fork(), we first convert the device-exclusive entries > back to ordinary PTEs, and we only ever allow conversion of writable > PTEs to device-exclusive -- only mprotect can currently change them to > readable-device-exclusive. Consequently, we always expect > PageAnonExclusive(page)==true and can_change_pte_writable()==true, > unless we are dealing with soft-dirty tracking or uffd-wp. But reusing > can_change_pte_writable() for now is cleaner. > > Signed-off-by: David Hildenbrand <david at redhat.com> > --- > mm/memory.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 03efeeef895a..db38d6ae4e74 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -725,18 +725,21 @@ static void restore_exclusive_pte(struct vm_area_struct *vma, > struct folio *folio = page_folio(page); > pte_t orig_pte; > pte_t pte; > - swp_entry_t entry; > > orig_pte = ptep_get(ptep); > pte = pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot))); > if (pte_swp_soft_dirty(orig_pte)) > pte = pte_mksoft_dirty(pte); > > - entry = pte_to_swp_entry(orig_pte); > if (pte_swp_uffd_wp(orig_pte)) > pte = pte_mkuffd_wp(pte); > - else if (is_writable_device_exclusive_entry(entry)) > - pte = maybe_mkwrite(pte_mkdirty(pte), vma); > + > + if ((vma->vm_flags & VM_WRITE) && > + can_change_pte_writable(vma, address, pte)) { > + if (folio_test_dirty(folio)) > + pte = pte_mkdirty(pte); > + pte = pte_mkwrite(pte, vma); > + } > > VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) && > PageAnonExclusive(page)), folio); > -- > 2.48.1 >-- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
David Hildenbrand
2025-Jan-30 09:58 UTC
[PATCH v1 05/12] mm/memory: detect writability in restore_exclusive_pte() through can_change_pte_writable()
On 30.01.25 10:51, Simona Vetter wrote:> On Wed, Jan 29, 2025 at 12:54:03PM +0100, David Hildenbrand wrote: >> Let's do it just like mprotect write-upgrade or during NUMA-hinting >> faults on PROT_NONE PTEs: detect if the PTE can be writable by using >> can_change_pte_writable(). >> >> Set the PTE only dirty if the folio is dirty: we might not >> necessarily have a write access, and setting the PTE writable doesn't >> require setting the PTE dirty. > > Not sure whether there's much difference in practice, since a device > exclusive access means a write, so the folio better be dirty (unless we > aborted halfway through). But then I couldn't find the code in nouveau to > do that, so now I'm confused.That confused me as well. Requiring the PTE to be writable does not imply that it is dirty. So something must either set the PTE or the folio dirty. ( In practice, most anonymous folios are dirty most of the time ... ) If we assume that "device-exclusive entries" are always dirty, then it doesn't make sense to set the folio dirty when creating device-exclusive entries. We'd always have to set the PTE dirty when restoring the exclusive pte. -- Cheers, David / dhildenb