Fuad Tabba
2024-Nov-08 16:20 UTC
[RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops
Some folios, such as hugetlb folios and zone device folios, require special handling when the folio's reference count reaches 0, before being freed. Moreover, guest_memfd folios will likely require special handling to notify it once a folio's reference count reaches 0, to facilitate shared to private folio conversion [*]. Currently, each usecase has a dedicated callback when the folio refcount reaches 0 to that effect. Adding yet more callbacks is not ideal. This patch series introduces struct folio_owner_ops and uses it as a generic way to handle callbacks on freeing a folio. It also applies the callbacks to hugetlb and zone device folios. A pointer to struct folio_owner_ops is overlaid on struct page compound_page, struct page/folio lru. To indicate that the folio uses the callback, this patch series sets bit 1 of the new field, similar to how bit 0 indicates a compound page. Patches 1 to 6 rework the hugetlb code to allow us to reuse folio->lru for the owner ops as long as they are not isolated. Patches 7 to 10 introduce struct folio_owner_ops, and apply the callbacks to zone device and hugetlb folios. Cheers, /fuad [*] https://lore.kernel.org/all/CAGtprH_JP2w-4rq02h_Ugvq5KuHX7TUvegOS7xUs_iy5hriE7g at mail.gmail.com/ David Hildenbrand (6): mm/hugetlb: rename isolate_hugetlb() to folio_isolate_hugetlb() mm/migrate: don't call folio_putback_active_hugetlb() on dst hugetlb folio mm/hugetlb: rename "folio_putback_active_hugetlb()" to "folio_putback_hugetlb()" mm/hugetlb-cgroup: convert hugetlb_cgroup_css_offline() to work on folios mm/hugetlb: use folio->lru int demote_free_hugetlb_folios() mm/hugetlb: use separate folio->_hugetlb_list for hugetlb-internals Fuad Tabba (4): mm: Introduce struct folio_owner_ops mm: Use getters and setters to access page pgmap mm: Use owner_ops on folio_put for zone device pages mm: hugetlb: Use owner_ops on folio_put for hugetlb drivers/gpu/drm/nouveau/nouveau_dmem.c | 4 +- drivers/pci/p2pdma.c | 8 +- include/linux/hugetlb.h | 10 +- include/linux/memremap.h | 14 +- include/linux/mm_types.h | 107 ++++++++++++++- lib/test_hmm.c | 2 +- mm/gup.c | 2 +- mm/hmm.c | 2 +- mm/hugetlb.c | 179 ++++++++++++++++++------- mm/hugetlb_cgroup.c | 19 ++- mm/hugetlb_vmemmap.c | 8 +- mm/internal.h | 1 - mm/memory.c | 2 +- mm/mempolicy.c | 2 +- mm/memremap.c | 49 +------ mm/migrate.c | 20 +-- mm/migrate_device.c | 4 +- mm/mm_init.c | 48 ++++++- mm/swap.c | 25 ++-- 19 files changed, 342 insertions(+), 164 deletions(-) base-commit: beb2622b970047000fa3cae64c23585669b01fca -- 2.47.0.277.g8800431eea-goog
Fuad Tabba
2024-Nov-08 16:20 UTC
[RFC PATCH v1 01/10] mm/hugetlb: rename isolate_hugetlb() to folio_isolate_hugetlb()
From: David Hildenbrand <david at redhat.com> Let's make the function name match "folio_isolate_lru()", and add some kernel doc. Signed-off-by: David Hildenbrand <david at redhat.com> Signed-off-by: Fuad Tabba <tabba at google.com> --- include/linux/hugetlb.h | 4 ++-- mm/gup.c | 2 +- mm/hugetlb.c | 23 ++++++++++++++++++++--- mm/mempolicy.c | 2 +- mm/migrate.c | 6 +++--- 5 files changed, 27 insertions(+), 10 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ae4fe8615bb6..b0cf8dbfeb6a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -153,7 +153,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, vm_flags_t vm_flags); long hugetlb_unreserve_pages(struct inode *inode, long start, long end, long freed); -bool isolate_hugetlb(struct folio *folio, struct list_head *list); +bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list); int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison); int get_huge_page_for_hwpoison(unsigned long pfn, int flags, bool *migratable_cleared); @@ -414,7 +414,7 @@ static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, return NULL; } -static inline bool isolate_hugetlb(struct folio *folio, struct list_head *list) +static inline bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list) { return false; } diff --git a/mm/gup.c b/mm/gup.c index 28ae330ec4dd..40bbcffca865 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2301,7 +2301,7 @@ static unsigned long collect_longterm_unpinnable_folios( continue; if (folio_test_hugetlb(folio)) { - isolate_hugetlb(folio, movable_folio_list); + folio_isolate_hugetlb(folio, movable_folio_list); continue; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index cec4b121193f..e17bb2847572 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2868,7 +2868,7 @@ static int alloc_and_dissolve_hugetlb_folio(struct hstate *h, * Fail with -EBUSY if not possible. */ spin_unlock_irq(&hugetlb_lock); - isolated = isolate_hugetlb(old_folio, list); + isolated = folio_isolate_hugetlb(old_folio, list); ret = isolated ? 0 : -EBUSY; spin_lock_irq(&hugetlb_lock); goto free_new; @@ -2953,7 +2953,7 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) if (hstate_is_gigantic(h)) return -ENOMEM; - if (folio_ref_count(folio) && isolate_hugetlb(folio, list)) + if (folio_ref_count(folio) && folio_isolate_hugetlb(folio, list)) ret = 0; else if (!folio_ref_count(folio)) ret = alloc_and_dissolve_hugetlb_folio(h, folio, list); @@ -7396,7 +7396,24 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ -bool isolate_hugetlb(struct folio *folio, struct list_head *list) +/** + * folio_isolate_hugetlb: try to isolate an allocated hugetlb folio + * @folio: the folio to isolate + * @list: the list to add the folio to on success + * + * Isolate an allocated (refcount > 0) hugetlb folio, marking it as + * isolated/non-migratable, and moving it from the active list to the + * given list. + * + * Isolation will fail if @folio is not an allocated hugetlb folio, or if + * it is already isolated/non-migratable. + * + * On success, an additional folio reference is taken that must be dropped + * using folio_putback_active_hugetlb() to undo the isolation. + * + * Return: True if isolation worked, otherwise False. + */ +bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list) { bool ret = true; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index bb37cd1a51d8..41bdff67757c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -647,7 +647,7 @@ static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask, */ if ((flags & MPOL_MF_MOVE_ALL) || (!folio_likely_mapped_shared(folio) && !hugetlb_pmd_shared(pte))) - if (!isolate_hugetlb(folio, qp->pagelist)) + if (!folio_isolate_hugetlb(folio, qp->pagelist)) qp->nr_failed++; unlock: spin_unlock(ptl); diff --git a/mm/migrate.c b/mm/migrate.c index dfb5eba3c522..55585b5f57ec 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -136,7 +136,7 @@ static void putback_movable_folio(struct folio *folio) * * This function shall be used whenever the isolated pageset has been * built from lru, balloon, hugetlbfs page. See isolate_migratepages_range() - * and isolate_hugetlb(). + * and folio_isolate_hugetlb(). */ void putback_movable_pages(struct list_head *l) { @@ -177,7 +177,7 @@ bool isolate_folio_to_list(struct folio *folio, struct list_head *list) bool isolated, lru; if (folio_test_hugetlb(folio)) - return isolate_hugetlb(folio, list); + return folio_isolate_hugetlb(folio, list); lru = !__folio_test_movable(folio); if (lru) @@ -2208,7 +2208,7 @@ static int __add_folio_for_migration(struct folio *folio, int node, return -EACCES; if (folio_test_hugetlb(folio)) { - if (isolate_hugetlb(folio, pagelist)) + if (folio_isolate_hugetlb(folio, pagelist)) return 1; } else if (folio_isolate_lru(folio)) { list_add_tail(&folio->lru, pagelist); -- 2.47.0.277.g8800431eea-goog
Fuad Tabba
2024-Nov-08 16:20 UTC
[RFC PATCH v1 02/10] mm/migrate: don't call folio_putback_active_hugetlb() on dst hugetlb folio
From: David Hildenbrand <david at redhat.com> We replaced a simple put_page() by a putback_active_hugepage() call in commit 3aaa76e125c1 (" mm: migrate: hugetlb: putback destination hugepage to active list"), to set the "active" flag on the dst hugetlb folio. Nowadays, we decoupled the "active" list from the flag, by calling the flag "migratable". Calling "putback" on something that wasn't allocated is weird and not future proof, especially if we might reach that path when migration failed and we just want to free the freshly allocated hugetlb folio. Let's simply set the "migratable" flag in move_hugetlb_state(), where we know that allocation succeeded, and use simple folio_put() to return our reference. Do we need the hugetlb_lock for setting that flag? Staring at other users of folio_set_hugetlb_migratable(), it does not look like it. After all, the dst folio should already be on the active list, and we are not modifying that list. Signed-off-by: David Hildenbrand <david at redhat.com> Signed-off-by: Fuad Tabba <tabba at google.com> --- mm/hugetlb.c | 5 +++++ mm/migrate.c | 8 ++++---- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e17bb2847572..da3fe1840ab8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7508,6 +7508,11 @@ void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int re } spin_unlock_irq(&hugetlb_lock); } + /* + * Our old folio is isolated and has "migratable" cleared until it + * is putback. As migration succeeded, set the new folio "migratable". + */ + folio_set_hugetlb_migratable(new_folio); } static void hugetlb_unshare_pmds(struct vm_area_struct *vma, diff --git a/mm/migrate.c b/mm/migrate.c index 55585b5f57ec..b129dc41c140 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1547,14 +1547,14 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, list_move_tail(&src->lru, ret); /* - * If migration was not successful and there's a freeing callback, use - * it. Otherwise, put_page() will drop the reference grabbed during - * isolation. + * If migration was not successful and there's a freeing callback, + * return the folio to that special allocator. Otherwise, simply drop + * our additional reference. */ if (put_new_folio) put_new_folio(dst, private); else - folio_putback_active_hugetlb(dst); + folio_put(dst); return rc; } -- 2.47.0.277.g8800431eea-goog
Fuad Tabba
2024-Nov-08 16:20 UTC
[RFC PATCH v1 03/10] mm/hugetlb: rename "folio_putback_active_hugetlb()" to "folio_putback_hugetlb()"
From: David Hildenbrand <david at redhat.com> Now that folio_putback_hugetlb() is only called on folios that were previously isolated through folio_isolate_hugetlb(), let's rename it to match folio_putback_lru(). Add some kernel doc to clarify how this function is supposed to be used. Signed-off-by: David Hildenbrand <david at redhat.com> Signed-off-by: Fuad Tabba <tabba at google.com> --- include/linux/hugetlb.h | 4 ++-- mm/hugetlb.c | 15 +++++++++++++-- mm/migrate.c | 6 +++--- 3 files changed, 18 insertions(+), 7 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b0cf8dbfeb6a..e846d7dac77c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -157,7 +157,7 @@ bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list); int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison); int get_huge_page_for_hwpoison(unsigned long pfn, int flags, bool *migratable_cleared); -void folio_putback_active_hugetlb(struct folio *folio); +void folio_putback_hugetlb(struct folio *folio); void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason); void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; @@ -430,7 +430,7 @@ static inline int get_huge_page_for_hwpoison(unsigned long pfn, int flags, return 0; } -static inline void folio_putback_active_hugetlb(struct folio *folio) +static inline void folio_putback_hugetlb(struct folio *folio) { } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index da3fe1840ab8..d58bd815fdf2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7409,7 +7409,7 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) * it is already isolated/non-migratable. * * On success, an additional folio reference is taken that must be dropped - * using folio_putback_active_hugetlb() to undo the isolation. + * using folio_putback_hugetlb() to undo the isolation. * * Return: True if isolation worked, otherwise False. */ @@ -7461,7 +7461,18 @@ int get_huge_page_for_hwpoison(unsigned long pfn, int flags, return ret; } -void folio_putback_active_hugetlb(struct folio *folio) +/** + * folio_putback_hugetlb: unisolate a hugetlb folio + * @folio: the isolated hugetlb folio + * + * Putback/un-isolate the hugetlb folio that was previous isolated using + * folio_isolate_hugetlb(): marking it non-isolated/migratable and putting it + * back onto the active list. + * + * Will drop the additional folio reference obtained through + * folio_isolate_hugetlb(). + */ +void folio_putback_hugetlb(struct folio *folio) { spin_lock_irq(&hugetlb_lock); folio_set_hugetlb_migratable(folio); diff --git a/mm/migrate.c b/mm/migrate.c index b129dc41c140..89292d131148 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -145,7 +145,7 @@ void putback_movable_pages(struct list_head *l) list_for_each_entry_safe(folio, folio2, l, lru) { if (unlikely(folio_test_hugetlb(folio))) { - folio_putback_active_hugetlb(folio); + folio_putback_hugetlb(folio); continue; } list_del(&folio->lru); @@ -1459,7 +1459,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, if (folio_ref_count(src) == 1) { /* page was freed from under us. So we are done. */ - folio_putback_active_hugetlb(src); + folio_putback_hugetlb(src); return MIGRATEPAGE_SUCCESS; } @@ -1542,7 +1542,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, folio_unlock(src); out: if (rc == MIGRATEPAGE_SUCCESS) - folio_putback_active_hugetlb(src); + folio_putback_hugetlb(src); else if (rc != -EAGAIN) list_move_tail(&src->lru, ret); -- 2.47.0.277.g8800431eea-goog
Fuad Tabba
2024-Nov-08 16:20 UTC
[RFC PATCH v1 04/10] mm/hugetlb-cgroup: convert hugetlb_cgroup_css_offline() to work on folios
From: David Hildenbrand <david at redhat.com> Let's convert hugetlb_cgroup_css_offline() and hugetlb_cgroup_move_parent() to work on folios. hugepage_activelist contains folios, not pages. While at it, rename page_hcg simply to hcg, removing most of the "page" terminology. Signed-off-by: David Hildenbrand <david at redhat.com> Signed-off-by: Fuad Tabba <tabba at google.com> --- mm/hugetlb_cgroup.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index d8d0e665caed..1bdeaf25f640 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -195,24 +195,23 @@ static void hugetlb_cgroup_css_free(struct cgroup_subsys_state *css) * cannot fail. */ static void hugetlb_cgroup_move_parent(int idx, struct hugetlb_cgroup *h_cg, - struct page *page) + struct folio *folio) { unsigned int nr_pages; struct page_counter *counter; - struct hugetlb_cgroup *page_hcg; + struct hugetlb_cgroup *hcg; struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(h_cg); - struct folio *folio = page_folio(page); - page_hcg = hugetlb_cgroup_from_folio(folio); + hcg = hugetlb_cgroup_from_folio(folio); /* * We can have pages in active list without any cgroup * ie, hugepage with less than 3 pages. We can safely * ignore those pages. */ - if (!page_hcg || page_hcg != h_cg) + if (!hcg || hcg != h_cg) goto out; - nr_pages = compound_nr(page); + nr_pages = folio_nr_pages(folio); if (!parent) { parent = root_h_cgroup; /* root has no limit */ @@ -235,13 +234,13 @@ static void hugetlb_cgroup_css_offline(struct cgroup_subsys_state *css) { struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(css); struct hstate *h; - struct page *page; + struct folio *folio; do { for_each_hstate(h) { spin_lock_irq(&hugetlb_lock); - list_for_each_entry(page, &h->hugepage_activelist, lru) - hugetlb_cgroup_move_parent(hstate_index(h), h_cg, page); + list_for_each_entry(folio, &h->hugepage_activelist, lru) + hugetlb_cgroup_move_parent(hstate_index(h), h_cg, folio); spin_unlock_irq(&hugetlb_lock); } -- 2.47.0.277.g8800431eea-goog
Fuad Tabba
2024-Nov-08 16:20 UTC
[RFC PATCH v1 05/10] mm/hugetlb: use folio->lru int demote_free_hugetlb_folios()
From: David Hildenbrand <david at redhat.com> Let's avoid messing with pages. Signed-off-by: David Hildenbrand <david at redhat.com> Signed-off-by: Fuad Tabba <tabba at google.com> --- mm/hugetlb.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d58bd815fdf2..a64852280213 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3806,13 +3806,15 @@ static long demote_free_hugetlb_folios(struct hstate *src, struct hstate *dst, for (i = 0; i < pages_per_huge_page(src); i += pages_per_huge_page(dst)) { struct page *page = folio_page(folio, i); + struct folio *new_folio; page->mapping = NULL; clear_compound_head(page); prep_compound_page(page, dst->order); + new_folio = page_folio(page); - init_new_hugetlb_folio(dst, page_folio(page)); - list_add(&page->lru, &dst_list); + init_new_hugetlb_folio(dst, new_folio); + list_add(&new_folio->lru, &dst_list); } } -- 2.47.0.277.g8800431eea-goog
Fuad Tabba
2024-Nov-08 16:20 UTC
[RFC PATCH v1 06/10] mm/hugetlb: use separate folio->_hugetlb_list for hugetlb-internals
From: David Hildenbrand <david at redhat.com> Let's use a separate list head in the folio, as long as hugetlb folios are not isolated. This way, we can reuse folio->lru for different purpose (e.g., owner_ops) as long as they are not isolated. Consequently, folio->lru will only be used while there is an additional folio reference that cannot be dropped until putback/un-isolated. Signed-off-by: David Hildenbrand <david at redhat.com> Signed-off-by: Fuad Tabba <tabba at google.com> --- include/linux/mm_types.h | 18 +++++++++ mm/hugetlb.c | 81 +++++++++++++++++++++------------------- mm/hugetlb_cgroup.c | 4 +- mm/hugetlb_vmemmap.c | 8 ++-- 4 files changed, 66 insertions(+), 45 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 80fef38d9d64..365c73be0bb4 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -310,6 +310,7 @@ typedef struct { * @_hugetlb_cgroup: Do not use directly, use accessor in hugetlb_cgroup.h. * @_hugetlb_cgroup_rsvd: Do not use directly, use accessor in hugetlb_cgroup.h. * @_hugetlb_hwpoison: Do not use directly, call raw_hwp_list_head(). + * @_hugetlb_list: To be used in hugetlb core code only. * @_deferred_list: Folios to be split under memory pressure. * @_unused_slab_obj_exts: Placeholder to match obj_exts in struct slab. * @@ -397,6 +398,17 @@ struct folio { }; struct page __page_2; }; + union { + struct { + unsigned long _flags_3; + unsigned long _head_3; + /* public: */ + struct list_head _hugetlb_list; + /* private: the union with struct page is transitional */ + }; + struct page __page_3; + }; + }; #define FOLIO_MATCH(pg, fl) \ @@ -433,6 +445,12 @@ FOLIO_MATCH(compound_head, _head_2); FOLIO_MATCH(flags, _flags_2a); FOLIO_MATCH(compound_head, _head_2a); #undef FOLIO_MATCH +#define FOLIO_MATCH(pg, fl) \ + static_assert(offsetof(struct folio, fl) == \ + offsetof(struct page, pg) + 3 * sizeof(struct page)) +FOLIO_MATCH(flags, _flags_3); +FOLIO_MATCH(compound_head, _head_3); +#undef FOLIO_MATCH /** * struct ptdesc - Memory descriptor for page tables. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a64852280213..2308e94d8615 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1316,7 +1316,7 @@ static void enqueue_hugetlb_folio(struct hstate *h, struct folio *folio) lockdep_assert_held(&hugetlb_lock); VM_BUG_ON_FOLIO(folio_ref_count(folio), folio); - list_move(&folio->lru, &h->hugepage_freelists[nid]); + list_move(&folio->_hugetlb_list, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; folio_set_hugetlb_freed(folio); @@ -1329,14 +1329,14 @@ static struct folio *dequeue_hugetlb_folio_node_exact(struct hstate *h, bool pin = !!(current->flags & PF_MEMALLOC_PIN); lockdep_assert_held(&hugetlb_lock); - list_for_each_entry(folio, &h->hugepage_freelists[nid], lru) { + list_for_each_entry(folio, &h->hugepage_freelists[nid], _hugetlb_list) { if (pin && !folio_is_longterm_pinnable(folio)) continue; if (folio_test_hwpoison(folio)) continue; - list_move(&folio->lru, &h->hugepage_activelist); + list_move(&folio->_hugetlb_list, &h->hugepage_activelist); folio_ref_unfreeze(folio, 1); folio_clear_hugetlb_freed(folio); h->free_huge_pages--; @@ -1599,7 +1599,7 @@ static void remove_hugetlb_folio(struct hstate *h, struct folio *folio, if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) return; - list_del(&folio->lru); + list_del(&folio->_hugetlb_list); if (folio_test_hugetlb_freed(folio)) { folio_clear_hugetlb_freed(folio); @@ -1616,8 +1616,9 @@ static void remove_hugetlb_folio(struct hstate *h, struct folio *folio, * pages. Otherwise, someone (memory error handling) may try to write * to tail struct pages. */ - if (!folio_test_hugetlb_vmemmap_optimized(folio)) + if (!folio_test_hugetlb_vmemmap_optimized(folio)) { __folio_clear_hugetlb(folio); + } h->nr_huge_pages--; h->nr_huge_pages_node[nid]--; @@ -1632,7 +1633,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio, lockdep_assert_held(&hugetlb_lock); - INIT_LIST_HEAD(&folio->lru); + INIT_LIST_HEAD(&folio->_hugetlb_list); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; @@ -1640,8 +1641,8 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio, h->surplus_huge_pages++; h->surplus_huge_pages_node[nid]++; } - __folio_set_hugetlb(folio); + folio_change_private(folio, NULL); /* * We have to set hugetlb_vmemmap_optimized again as above @@ -1789,8 +1790,8 @@ static void bulk_vmemmap_restore_error(struct hstate *h, * hugetlb pages with vmemmap we will free up memory so that we * can allocate vmemmap for more hugetlb pages. */ - list_for_each_entry_safe(folio, t_folio, non_hvo_folios, lru) { - list_del(&folio->lru); + list_for_each_entry_safe(folio, t_folio, non_hvo_folios, _hugetlb_list) { + list_del(&folio->_hugetlb_list); spin_lock_irq(&hugetlb_lock); __folio_clear_hugetlb(folio); spin_unlock_irq(&hugetlb_lock); @@ -1808,14 +1809,14 @@ static void bulk_vmemmap_restore_error(struct hstate *h, * If are able to restore vmemmap and free one hugetlb page, we * quit processing the list to retry the bulk operation. */ - list_for_each_entry_safe(folio, t_folio, folio_list, lru) + list_for_each_entry_safe(folio, t_folio, folio_list, _hugetlb_list) if (hugetlb_vmemmap_restore_folio(h, folio)) { - list_del(&folio->lru); + list_del(&folio->_hugetlb_list); spin_lock_irq(&hugetlb_lock); add_hugetlb_folio(h, folio, true); spin_unlock_irq(&hugetlb_lock); } else { - list_del(&folio->lru); + list_del(&folio->_hugetlb_list); spin_lock_irq(&hugetlb_lock); __folio_clear_hugetlb(folio); spin_unlock_irq(&hugetlb_lock); @@ -1856,12 +1857,12 @@ static void update_and_free_pages_bulk(struct hstate *h, VM_WARN_ON(ret < 0); if (!list_empty(&non_hvo_folios) && ret) { spin_lock_irq(&hugetlb_lock); - list_for_each_entry(folio, &non_hvo_folios, lru) + list_for_each_entry(folio, &non_hvo_folios, _hugetlb_list) __folio_clear_hugetlb(folio); spin_unlock_irq(&hugetlb_lock); } - list_for_each_entry_safe(folio, t_folio, &non_hvo_folios, lru) { + list_for_each_entry_safe(folio, t_folio, &non_hvo_folios, _hugetlb_list) { update_and_free_hugetlb_folio(h, folio, false); cond_resched(); } @@ -1959,7 +1960,7 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid) static void init_new_hugetlb_folio(struct hstate *h, struct folio *folio) { __folio_set_hugetlb(folio); - INIT_LIST_HEAD(&folio->lru); + INIT_LIST_HEAD(&folio->_hugetlb_list); hugetlb_set_folio_subpool(folio, NULL); set_hugetlb_cgroup(folio, NULL); set_hugetlb_cgroup_rsvd(folio, NULL); @@ -2112,7 +2113,7 @@ static void prep_and_add_allocated_folios(struct hstate *h, /* Add all new pool pages to free lists in one lock cycle */ spin_lock_irqsave(&hugetlb_lock, flags); - list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { + list_for_each_entry_safe(folio, tmp_f, folio_list, _hugetlb_list) { __prep_account_new_huge_page(h, folio_nid(folio)); enqueue_hugetlb_folio(h, folio); } @@ -2165,7 +2166,7 @@ static struct folio *remove_pool_hugetlb_folio(struct hstate *h, if ((!acct_surplus || h->surplus_huge_pages_node[node]) && !list_empty(&h->hugepage_freelists[node])) { folio = list_entry(h->hugepage_freelists[node].next, - struct folio, lru); + struct folio, _hugetlb_list); remove_hugetlb_folio(h, folio, acct_surplus); break; } @@ -2491,7 +2492,7 @@ static int gather_surplus_pages(struct hstate *h, long delta) alloc_ok = false; break; } - list_add(&folio->lru, &surplus_list); + list_add(&folio->_hugetlb_list, &surplus_list); cond_resched(); } allocated += i; @@ -2526,7 +2527,7 @@ static int gather_surplus_pages(struct hstate *h, long delta) ret = 0; /* Free the needed pages to the hugetlb pool */ - list_for_each_entry_safe(folio, tmp, &surplus_list, lru) { + list_for_each_entry_safe(folio, tmp, &surplus_list, _hugetlb_list) { if ((--needed) < 0) break; /* Add the page to the hugetlb allocator */ @@ -2539,7 +2540,7 @@ static int gather_surplus_pages(struct hstate *h, long delta) * Free unnecessary surplus pages to the buddy allocator. * Pages have no ref count, call free_huge_folio directly. */ - list_for_each_entry_safe(folio, tmp, &surplus_list, lru) + list_for_each_entry_safe(folio, tmp, &surplus_list, _hugetlb_list) free_huge_folio(folio); spin_lock_irq(&hugetlb_lock); @@ -2588,7 +2589,7 @@ static void return_unused_surplus_pages(struct hstate *h, if (!folio) goto out; - list_add(&folio->lru, &page_list); + list_add(&folio->_hugetlb_list, &page_list); } out: @@ -3051,7 +3052,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } - list_add(&folio->lru, &h->hugepage_activelist); + list_add(&folio->_hugetlb_list, &h->hugepage_activelist); folio_ref_unfreeze(folio, 1); /* Fall through */ } @@ -3211,7 +3212,7 @@ static void __init prep_and_add_bootmem_folios(struct hstate *h, /* Send list for bulk vmemmap optimization processing */ hugetlb_vmemmap_optimize_folios(h, folio_list); - list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { + list_for_each_entry_safe(folio, tmp_f, folio_list, _hugetlb_list) { if (!folio_test_hugetlb_vmemmap_optimized(folio)) { /* * If HVO fails, initialize all tail struct pages @@ -3260,7 +3261,7 @@ static void __init gather_bootmem_prealloc_node(unsigned long nid) hugetlb_folio_init_vmemmap(folio, h, HUGETLB_VMEMMAP_RESERVE_PAGES); init_new_hugetlb_folio(h, folio); - list_add(&folio->lru, &folio_list); + list_add(&folio->_hugetlb_list, &folio_list); /* * We need to restore the 'stolen' pages to totalram_pages @@ -3317,7 +3318,7 @@ static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid) &node_states[N_MEMORY], NULL); if (!folio) break; - list_add(&folio->lru, &folio_list); + list_add(&folio->_hugetlb_list, &folio_list); } cond_resched(); } @@ -3379,7 +3380,7 @@ static void __init hugetlb_pages_alloc_boot_node(unsigned long start, unsigned l if (!folio) break; - list_move(&folio->lru, &folio_list); + list_move(&folio->_hugetlb_list, &folio_list); cond_resched(); } @@ -3544,13 +3545,13 @@ static void try_to_free_low(struct hstate *h, unsigned long count, for_each_node_mask(i, *nodes_allowed) { struct folio *folio, *next; struct list_head *freel = &h->hugepage_freelists[i]; - list_for_each_entry_safe(folio, next, freel, lru) { + list_for_each_entry_safe(folio, next, freel, _hugetlb_list) { if (count >= h->nr_huge_pages) goto out; if (folio_test_highmem(folio)) continue; remove_hugetlb_folio(h, folio, false); - list_add(&folio->lru, &page_list); + list_add(&folio->_hugetlb_list, &page_list); } } @@ -3703,7 +3704,7 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, goto out; } - list_add(&folio->lru, &page_list); + list_add(&folio->_hugetlb_list, &page_list); allocated++; /* Bail for signals. Probably ctrl-c from user */ @@ -3750,7 +3751,7 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, if (!folio) break; - list_add(&folio->lru, &page_list); + list_add(&folio->_hugetlb_list, &page_list); } /* free the pages after dropping lock */ spin_unlock_irq(&hugetlb_lock); @@ -3793,13 +3794,13 @@ static long demote_free_hugetlb_folios(struct hstate *src, struct hstate *dst, */ mutex_lock(&dst->resize_lock); - list_for_each_entry_safe(folio, next, src_list, lru) { + list_for_each_entry_safe(folio, next, src_list, _hugetlb_list) { int i; if (folio_test_hugetlb_vmemmap_optimized(folio)) continue; - list_del(&folio->lru); + list_del(&folio->_hugetlb_list); split_page_owner(&folio->page, huge_page_order(src), huge_page_order(dst)); pgalloc_tag_split(folio, huge_page_order(src), huge_page_order(dst)); @@ -3814,7 +3815,7 @@ static long demote_free_hugetlb_folios(struct hstate *src, struct hstate *dst, new_folio = page_folio(page); init_new_hugetlb_folio(dst, new_folio); - list_add(&new_folio->lru, &dst_list); + list_add(&new_folio->_hugetlb_list, &dst_list); } } @@ -3847,12 +3848,12 @@ static long demote_pool_huge_page(struct hstate *src, nodemask_t *nodes_allowed, LIST_HEAD(list); struct folio *folio, *next; - list_for_each_entry_safe(folio, next, &src->hugepage_freelists[node], lru) { + list_for_each_entry_safe(folio, next, &src->hugepage_freelists[node], _hugetlb_list) { if (folio_test_hwpoison(folio)) continue; remove_hugetlb_folio(src, folio, false); - list_add(&folio->lru, &list); + list_add(&folio->_hugetlb_list, &list); if (++nr_demoted == nr_to_demote) break; @@ -3864,8 +3865,8 @@ static long demote_pool_huge_page(struct hstate *src, nodemask_t *nodes_allowed, spin_lock_irq(&hugetlb_lock); - list_for_each_entry_safe(folio, next, &list, lru) { - list_del(&folio->lru); + list_for_each_entry_safe(folio, next, &list, _hugetlb_list) { + list_del(&folio->_hugetlb_list); add_hugetlb_folio(src, folio, false); nr_demoted--; @@ -7427,7 +7428,8 @@ bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list) goto unlock; } folio_clear_hugetlb_migratable(folio); - list_move_tail(&folio->lru, list); + list_del_init(&folio->_hugetlb_list); + list_add_tail(&folio->lru, list); unlock: spin_unlock_irq(&hugetlb_lock); return ret; @@ -7478,7 +7480,8 @@ void folio_putback_hugetlb(struct folio *folio) { spin_lock_irq(&hugetlb_lock); folio_set_hugetlb_migratable(folio); - list_move_tail(&folio->lru, &(folio_hstate(folio))->hugepage_activelist); + list_del_init(&folio->lru); + list_add_tail(&folio->_hugetlb_list, &(folio_hstate(folio))->hugepage_activelist); spin_unlock_irq(&hugetlb_lock); folio_put(folio); } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 1bdeaf25f640..ee720eeaf6b1 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -239,7 +239,7 @@ static void hugetlb_cgroup_css_offline(struct cgroup_subsys_state *css) do { for_each_hstate(h) { spin_lock_irq(&hugetlb_lock); - list_for_each_entry(folio, &h->hugepage_activelist, lru) + list_for_each_entry(folio, &h->hugepage_activelist, _hugetlb_list) hugetlb_cgroup_move_parent(hstate_index(h), h_cg, folio); spin_unlock_irq(&hugetlb_lock); @@ -933,7 +933,7 @@ void hugetlb_cgroup_migrate(struct folio *old_folio, struct folio *new_folio) /* move the h_cg details to new cgroup */ set_hugetlb_cgroup(new_folio, h_cg); set_hugetlb_cgroup_rsvd(new_folio, h_cg_rsvd); - list_move(&new_folio->lru, &h->hugepage_activelist); + list_move(&new_folio->_hugetlb_list, &h->hugepage_activelist); spin_unlock_irq(&hugetlb_lock); return; } diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 57b7f591eee8..b2cb8d328aac 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -519,7 +519,7 @@ long hugetlb_vmemmap_restore_folios(const struct hstate *h, long ret = 0; unsigned long flags = VMEMMAP_REMAP_NO_TLB_FLUSH | VMEMMAP_SYNCHRONIZE_RCU; - list_for_each_entry_safe(folio, t_folio, folio_list, lru) { + list_for_each_entry_safe(folio, t_folio, folio_list, _hugetlb_list) { if (folio_test_hugetlb_vmemmap_optimized(folio)) { ret = __hugetlb_vmemmap_restore_folio(h, folio, flags); /* only need to synchronize_rcu() once for each batch */ @@ -531,7 +531,7 @@ long hugetlb_vmemmap_restore_folios(const struct hstate *h, } /* Add non-optimized folios to output list */ - list_move(&folio->lru, non_hvo_folios); + list_move(&folio->_hugetlb_list, non_hvo_folios); } if (restored) @@ -651,7 +651,7 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l LIST_HEAD(vmemmap_pages); unsigned long flags = VMEMMAP_REMAP_NO_TLB_FLUSH | VMEMMAP_SYNCHRONIZE_RCU; - list_for_each_entry(folio, folio_list, lru) { + list_for_each_entry(folio, folio_list, _hugetlb_list) { int ret = hugetlb_vmemmap_split_folio(h, folio); /* @@ -666,7 +666,7 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l flush_tlb_all(); - list_for_each_entry(folio, folio_list, lru) { + list_for_each_entry(folio, folio_list, _hugetlb_list) { int ret; ret = __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, flags); -- 2.47.0.277.g8800431eea-goog
Fuad Tabba
2024-Nov-08 16:20 UTC
[RFC PATCH v1 07/10] mm: Introduce struct folio_owner_ops
Introduce struct folio_owner_ops, a method table that contains callbacks to owners of folios that need special handling for certain operations. For now, it only contains a callback for folio free(), which is called immediately after the folio refcount drops to 0. Add a pointer to this struct overlaid on struct page compound_head, pgmap, and struct page/folio lru. The users of this struct either will not use lru (e.g., zone device), or would be able to easily isolate when lru is being used (e.g., hugetlb) and handle it accordingly. While folios are isolated, they cannot get freed and the owner_ops are unstable. This is sufficient for the current use case of returning these folios to a custom allocator. To identify that a folio has owner_ops, we set bit 1 of the field, in a similar way to that bit 0 of compound_head is used to identify compound pages. Signed-off-by: Fuad Tabba <tabba at google.com> --- include/linux/mm_types.h | 64 +++++++++++++++++++++++++++++++++++++--- mm/swap.c | 19 ++++++++++++ 2 files changed, 79 insertions(+), 4 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 365c73be0bb4..6e06286f44f1 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -41,10 +41,12 @@ struct mem_cgroup; * * If you allocate the page using alloc_pages(), you can use some of the * space in struct page for your own purposes. The five words in the main - * union are available, except for bit 0 of the first word which must be - * kept clear. Many users use this word to store a pointer to an object - * which is guaranteed to be aligned. If you use the same storage as - * page->mapping, you must restore it to NULL before freeing the page. + * union are available, except for bit 0 (used for compound_head pages) + * and bit 1 (used for owner_ops) of the first word, which must be kept + * clear and used with care. Many users use this word to store a pointer + * to an object which is guaranteed to be aligned. If you use the same + * storage as page->mapping, you must restore it to NULL before freeing + * the page. * * The mapcount field must not be used for own purposes. * @@ -283,10 +285,16 @@ typedef struct { unsigned long val; } swp_entry_t; +struct folio_owner_ops; + /** * struct folio - Represents a contiguous set of bytes. * @flags: Identical to the page flags. * @lru: Least Recently Used list; tracks how recently this folio was used. + * @owner_ops: Pointer to callback operations of the folio owner. Valid if bit 1 + * is set. + * NOTE: Cannot be used with lru, since it is overlaid with it. To use lru, + * owner_ops must be cleared first, and restored once done with lru. * @mlock_count: Number of times this folio has been pinned by mlock(). * @mapping: The file this page belongs to, or refers to the anon_vma for * anonymous memory. @@ -330,6 +338,7 @@ struct folio { unsigned long flags; union { struct list_head lru; + const struct folio_owner_ops *owner_ops; /* Bit 1 is set */ /* private: avoid cluttering the output */ struct { void *__filler; @@ -417,6 +426,7 @@ FOLIO_MATCH(flags, flags); FOLIO_MATCH(lru, lru); FOLIO_MATCH(mapping, mapping); FOLIO_MATCH(compound_head, lru); +FOLIO_MATCH(compound_head, owner_ops); FOLIO_MATCH(index, index); FOLIO_MATCH(private, private); FOLIO_MATCH(_mapcount, _mapcount); @@ -452,6 +462,13 @@ FOLIO_MATCH(flags, _flags_3); FOLIO_MATCH(compound_head, _head_3); #undef FOLIO_MATCH +struct folio_owner_ops { + /* + * Called once the folio refcount reaches 0. + */ + void (*free)(struct folio *folio); +}; + /** * struct ptdesc - Memory descriptor for page tables. * @__page_flags: Same as page flags. Powerpc only. @@ -560,6 +577,45 @@ static inline void *folio_get_private(struct folio *folio) return folio->private; } +/* + * Use bit 1, since bit 0 is used to indicate a compound page in compound_head, + * which owner_ops is overlaid with. + */ +#define FOLIO_OWNER_OPS_BIT 1UL +#define FOLIO_OWNER_OPS (1UL << FOLIO_OWNER_OPS_BIT) + +/* + * Set the folio owner_ops as well as bit 1 of the pointer to indicate that the + * folio has owner_ops. + */ +static inline void folio_set_owner_ops(struct folio *folio, const struct folio_owner_ops *owner_ops) +{ + owner_ops = (const struct folio_owner_ops *)((unsigned long)owner_ops | FOLIO_OWNER_OPS); + folio->owner_ops = owner_ops; +} + +/* + * Clear the folio owner_ops including bit 1 of the pointer. + */ +static inline void folio_clear_owner_ops(struct folio *folio) +{ + folio->owner_ops = NULL; +} + +/* + * Return the folio's owner_ops if it has them, otherwise, return NULL. + */ +static inline const struct folio_owner_ops *folio_get_owner_ops(struct folio *folio) +{ + const struct folio_owner_ops *owner_ops = folio->owner_ops; + + if (!((unsigned long)owner_ops & FOLIO_OWNER_OPS)) + return NULL; + + owner_ops = (const struct folio_owner_ops *)((unsigned long)owner_ops & ~FOLIO_OWNER_OPS); + return owner_ops; +} + struct page_frag_cache { void * va; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) diff --git a/mm/swap.c b/mm/swap.c index 638a3f001676..767ff6d8f47b 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -110,6 +110,13 @@ static void page_cache_release(struct folio *folio) void __folio_put(struct folio *folio) { + const struct folio_owner_ops *owner_ops = folio_get_owner_ops(folio); + + if (unlikely(owner_ops)) { + owner_ops->free(folio); + return; + } + if (unlikely(folio_is_zone_device(folio))) { free_zone_device_folio(folio); return; @@ -929,10 +936,22 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) for (i = 0, j = 0; i < folios->nr; i++) { struct folio *folio = folios->folios[i]; unsigned int nr_refs = refs ? refs[i] : 1; + const struct folio_owner_ops *owner_ops; if (is_huge_zero_folio(folio)) continue; + owner_ops = folio_get_owner_ops(folio); + if (unlikely(owner_ops)) { + if (lruvec) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; + } + if (folio_ref_sub_and_test(folio, nr_refs)) + owner_ops->free(folio); + continue; + } + if (folio_is_zone_device(folio)) { if (lruvec) { unlock_page_lruvec_irqrestore(lruvec, flags); -- 2.47.0.277.g8800431eea-goog
Fuad Tabba
2024-Nov-08 16:20 UTC
[RFC PATCH v1 08/10] mm: Use getters and setters to access page pgmap
The pointer to pgmap in struct page is overlaid with folio owner_ops. To indicate that a page/folio has owner ops, bit 1 is set. Therefore, before we can start to using owner_ops, we need to ensure that all accesses to page pgmap sanitize the pointer value. This patch introduces the accessors, which will be modified in the following patch to sanitize the pointer values. No functional change intended. Signed-off-by: Fuad Tabba <tabba at google.com> --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 4 +++- drivers/pci/p2pdma.c | 8 +++++--- include/linux/memremap.h | 6 +++--- include/linux/mm_types.h | 13 +++++++++++++ lib/test_hmm.c | 2 +- mm/hmm.c | 2 +- mm/memory.c | 2 +- mm/memremap.c | 19 +++++++++++-------- mm/migrate_device.c | 4 ++-- mm/mm_init.c | 2 +- 10 files changed, 41 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 1a072568cef6..d7d9d9476bb0 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -88,7 +88,9 @@ struct nouveau_dmem { static struct nouveau_dmem_chunk *nouveau_page_to_chunk(struct page *page) { - return container_of(page->pgmap, struct nouveau_dmem_chunk, pagemap); + struct dev_pagemap *pgmap = page_get_pgmap(page); + + return container_of(pgmap, struct nouveau_dmem_chunk, pagemap); } static struct nouveau_drm *page_to_drm(struct page *page) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 4f47a13cb500..19519bb4ba56 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -193,7 +193,7 @@ static const struct attribute_group p2pmem_group = { static void p2pdma_page_free(struct page *page) { - struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap); + struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page_get_pgmap(page)); /* safe to dereference while a reference is held to the percpu ref */ struct pci_p2pdma *p2pdma rcu_dereference_protected(pgmap->provider->p2pdma, 1); @@ -1016,8 +1016,10 @@ enum pci_p2pdma_map_type pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev, struct scatterlist *sg) { - if (state->pgmap != sg_page(sg)->pgmap) { - state->pgmap = sg_page(sg)->pgmap; + struct dev_pagemap *pgmap = page_get_pgmap(sg_page(sg)); + + if (state->pgmap != pgmap) { + state->pgmap = pgmap; state->map = pci_p2pdma_map_type(state->pgmap, dev); state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset; } diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 3f7143ade32c..060e27b6aee0 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -161,7 +161,7 @@ static inline bool is_device_private_page(const struct page *page) { return IS_ENABLED(CONFIG_DEVICE_PRIVATE) && is_zone_device_page(page) && - page->pgmap->type == MEMORY_DEVICE_PRIVATE; + page_get_pgmap(page)->type == MEMORY_DEVICE_PRIVATE; } static inline bool folio_is_device_private(const struct folio *folio) @@ -173,13 +173,13 @@ static inline bool is_pci_p2pdma_page(const struct page *page) { return IS_ENABLED(CONFIG_PCI_P2PDMA) && is_zone_device_page(page) && - page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA; + page_get_pgmap(page)->type == MEMORY_DEVICE_PCI_P2PDMA; } static inline bool is_device_coherent_page(const struct page *page) { return is_zone_device_page(page) && - page->pgmap->type == MEMORY_DEVICE_COHERENT; + page_get_pgmap(page)->type == MEMORY_DEVICE_COHERENT; } static inline bool folio_is_device_coherent(const struct folio *folio) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e06286f44f1..27075ea24e67 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -616,6 +616,19 @@ static inline const struct folio_owner_ops *folio_get_owner_ops(struct folio *fo return owner_ops; } +/* + * Get the page dev_pagemap pgmap pointer. + */ +#define page_get_pgmap(page) ((page)->pgmap) + +/* + * Set the page dev_pagemap pgmap pointer. + */ +static inline void page_set_pgmap(struct page *page, struct dev_pagemap *pgmap) +{ + page->pgmap = pgmap; +} + struct page_frag_cache { void * va; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 056f2e411d7b..d3e3843f57dd 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -195,7 +195,7 @@ static int dmirror_fops_release(struct inode *inode, struct file *filp) static struct dmirror_chunk *dmirror_page_to_chunk(struct page *page) { - return container_of(page->pgmap, struct dmirror_chunk, pagemap); + return container_of(page_get_pgmap(page), struct dmirror_chunk, pagemap); } static struct dmirror_device *dmirror_page_to_device(struct page *page) diff --git a/mm/hmm.c b/mm/hmm.c index 7e0229ae4a5a..b5f5ac218fda 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -248,7 +248,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, * just report the PFN. */ if (is_device_private_entry(entry) && - pfn_swap_entry_to_page(entry)->pgmap->owner =+ page_get_pgmap(pfn_swap_entry_to_page(entry))->owner = range->dev_private_owner) { cpu_flags = HMM_PFN_VALID; if (is_writable_device_private_entry(entry)) diff --git a/mm/memory.c b/mm/memory.c index 80850cad0e6f..5853fa5767c7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4276,7 +4276,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ get_page(vmf->page); pte_unmap_unlock(vmf->pte, vmf->ptl); - ret = vmf->page->pgmap->ops->migrate_to_ram(vmf); + ret = page_get_pgmap(vmf->page)->ops->migrate_to_ram(vmf); put_page(vmf->page); } else if (is_hwpoison_entry(entry)) { ret = VM_FAULT_HWPOISON; diff --git a/mm/memremap.c b/mm/memremap.c index 40d4547ce514..931bc85da1df 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -458,8 +458,9 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap); void free_zone_device_folio(struct folio *folio) { - if (WARN_ON_ONCE(!folio->page.pgmap->ops || - !folio->page.pgmap->ops->page_free)) + struct dev_pagemap *pgmap = page_get_pgmap(&folio->page); + + if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->page_free)) return; mem_cgroup_uncharge(folio); @@ -486,17 +487,17 @@ void free_zone_device_folio(struct folio *folio) * to clear folio->mapping. */ folio->mapping = NULL; - folio->page.pgmap->ops->page_free(folio_page(folio, 0)); + pgmap->ops->page_free(folio_page(folio, 0)); - if (folio->page.pgmap->type != MEMORY_DEVICE_PRIVATE && - folio->page.pgmap->type != MEMORY_DEVICE_COHERENT) + if (pgmap->type != MEMORY_DEVICE_PRIVATE && + pgmap->type != MEMORY_DEVICE_COHERENT) /* * Reset the refcount to 1 to prepare for handing out the page * again. */ folio_set_count(folio, 1); else - put_dev_pagemap(folio->page.pgmap); + put_dev_pagemap(pgmap); } void zone_device_page_init(struct page *page) @@ -505,7 +506,7 @@ void zone_device_page_init(struct page *page) * Drivers shouldn't be allocating pages after calling * memunmap_pages(). */ - WARN_ON_ONCE(!percpu_ref_tryget_live(&page->pgmap->ref)); + WARN_ON_ONCE(!percpu_ref_tryget_live(&page_get_pgmap(page)->ref)); set_page_count(page, 1); lock_page(page); } @@ -514,7 +515,9 @@ EXPORT_SYMBOL_GPL(zone_device_page_init); #ifdef CONFIG_FS_DAX bool __put_devmap_managed_folio_refs(struct folio *folio, int refs) { - if (folio->page.pgmap->type != MEMORY_DEVICE_FS_DAX) + struct dev_pagemap *pgmap = page_get_pgmap(&folio->page); + + if (pgmap->type != MEMORY_DEVICE_FS_DAX) return false; /* diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 9cf26592ac93..368def358d02 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -135,7 +135,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, page = pfn_swap_entry_to_page(entry); if (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_PRIVATE) || - page->pgmap->owner != migrate->pgmap_owner) + page_get_pgmap(page)->owner != migrate->pgmap_owner) goto next; mpfn = migrate_pfn(page_to_pfn(page)) | @@ -156,7 +156,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, goto next; else if (page && is_device_coherent_page(page) && (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_COHERENT) || - page->pgmap->owner != migrate->pgmap_owner)) + page_get_pgmap(page)->owner != migrate->pgmap_owner)) goto next; mpfn = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE; mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0; diff --git a/mm/mm_init.c b/mm/mm_init.c index 1c205b0a86ed..279cdaebfd2b 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -995,7 +995,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, * and zone_device_data. It is a bug if a ZONE_DEVICE page is * ever freed or placed on a driver-private list. */ - page->pgmap = pgmap; + page_set_pgmap(page, pgmap); page->zone_device_data = NULL; /* -- 2.47.0.277.g8800431eea-goog
Fuad Tabba
2024-Nov-08 16:20 UTC
[RFC PATCH v1 09/10] mm: Use owner_ops on folio_put for zone device pages
Now that we have the folio_owner_ops callback, use it for zone device pages instead of using a dedicated callback. Note that struct dev_pagemap (pgmap) in struct page is overlaid with struct folio owner_ops. Therefore, make struct dev_pagemap contain an instance of struct folio_owner_ops, to handle it the same way as struct folio_owner_ops. Also note that, although struct dev_pagemap_ops has a page_free() function, it does not have the same intention as the folio_owner_ops free() callback nor does it have the same behavior. The page_free() function is used as an optional callback to drivers that use zone device to inform them of the freeing of the page. Signed-off-by: Fuad Tabba <tabba at google.com> --- include/linux/memremap.h | 8 +++++++ include/linux/mm_types.h | 16 ++++++++++++-- mm/internal.h | 1 - mm/memremap.c | 44 -------------------------------------- mm/mm_init.c | 46 ++++++++++++++++++++++++++++++++++++++++ mm/swap.c | 18 ++-------------- 6 files changed, 70 insertions(+), 63 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 060e27b6aee0..5b68bbc588a3 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -106,6 +106,7 @@ struct dev_pagemap_ops { /** * struct dev_pagemap - metadata for ZONE_DEVICE mappings + * @folio_ops: method table for folio operations. * @altmap: pre-allocated/reserved memory for vmemmap allocations * @ref: reference count that pins the devm_memremap_pages() mapping * @done: completion for @ref @@ -125,6 +126,7 @@ struct dev_pagemap_ops { * @ranges: array of ranges to be mapped when nr_range > 1 */ struct dev_pagemap { + struct folio_owner_ops folio_ops; struct vmem_altmap altmap; struct percpu_ref ref; struct completion done; @@ -140,6 +142,12 @@ struct dev_pagemap { }; }; +/* + * The folio_owner_ops structure needs to be first since pgmap in struct page is + * overlaid with owner_ops in struct folio. + */ +static_assert(offsetof(struct dev_pagemap, folio_ops) == 0); + static inline bool pgmap_has_memory_failure(struct dev_pagemap *pgmap) { return pgmap->ops && pgmap->ops->memory_failure; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 27075ea24e67..a72fda20d5e9 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -427,6 +427,7 @@ FOLIO_MATCH(lru, lru); FOLIO_MATCH(mapping, mapping); FOLIO_MATCH(compound_head, lru); FOLIO_MATCH(compound_head, owner_ops); +FOLIO_MATCH(pgmap, owner_ops); FOLIO_MATCH(index, index); FOLIO_MATCH(private, private); FOLIO_MATCH(_mapcount, _mapcount); @@ -618,15 +619,26 @@ static inline const struct folio_owner_ops *folio_get_owner_ops(struct folio *fo /* * Get the page dev_pagemap pgmap pointer. + * + * The page pgmap is overlaid with the folio owner_ops, where bit 1 is used to + * indicate that the page/folio has owner ops. The dev_pagemap contains + * owner_ops and is handled the same way. The getter returns a sanitized + * pointer. */ -#define page_get_pgmap(page) ((page)->pgmap) +#define page_get_pgmap(page) \ + ((struct dev_pagemap *)((unsigned long)(page)->pgmap & ~FOLIO_OWNER_OPS)) /* * Set the page dev_pagemap pgmap pointer. + * + * The page pgmap is overlaid with the folio owner_ops, where bit 1 is used to + * indicate that the page/folio has owner ops. The dev_pagemap contains + * owner_ops and is handled the same way. The setter sets bit 1 to indicate + * that the page owner_ops. */ static inline void page_set_pgmap(struct page *page, struct dev_pagemap *pgmap) { - page->pgmap = pgmap; + page->pgmap = (struct dev_pagemap *)((unsigned long)pgmap | FOLIO_OWNER_OPS); } struct page_frag_cache { diff --git a/mm/internal.h b/mm/internal.h index 5a7302baeed7..a041247bed10 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1262,7 +1262,6 @@ int numa_migrate_check(struct folio *folio, struct vm_fault *vmf, unsigned long addr, int *flags, bool writable, int *last_cpupid); -void free_zone_device_folio(struct folio *folio); int migrate_device_coherent_folio(struct folio *folio); struct vm_struct *__get_vm_area_node(unsigned long size, diff --git a/mm/memremap.c b/mm/memremap.c index 931bc85da1df..9fd5f57219eb 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -456,50 +456,6 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, } EXPORT_SYMBOL_GPL(get_dev_pagemap); -void free_zone_device_folio(struct folio *folio) -{ - struct dev_pagemap *pgmap = page_get_pgmap(&folio->page); - - if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->page_free)) - return; - - mem_cgroup_uncharge(folio); - - /* - * Note: we don't expect anonymous compound pages yet. Once supported - * and we could PTE-map them similar to THP, we'd have to clear - * PG_anon_exclusive on all tail pages. - */ - if (folio_test_anon(folio)) { - VM_BUG_ON_FOLIO(folio_test_large(folio), folio); - __ClearPageAnonExclusive(folio_page(folio, 0)); - } - - /* - * When a device managed page is freed, the folio->mapping field - * may still contain a (stale) mapping value. For example, the - * lower bits of folio->mapping may still identify the folio as an - * anonymous folio. Ultimately, this entire field is just stale - * and wrong, and it will cause errors if not cleared. - * - * For other types of ZONE_DEVICE pages, migration is either - * handled differently or not done at all, so there is no need - * to clear folio->mapping. - */ - folio->mapping = NULL; - pgmap->ops->page_free(folio_page(folio, 0)); - - if (pgmap->type != MEMORY_DEVICE_PRIVATE && - pgmap->type != MEMORY_DEVICE_COHERENT) - /* - * Reset the refcount to 1 to prepare for handing out the page - * again. - */ - folio_set_count(folio, 1); - else - put_dev_pagemap(pgmap); -} - void zone_device_page_init(struct page *page) { /* diff --git a/mm/mm_init.c b/mm/mm_init.c index 279cdaebfd2b..47c1f8fd4914 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -974,6 +974,51 @@ static void __init memmap_init(void) } #ifdef CONFIG_ZONE_DEVICE + +static void free_zone_device_folio(struct folio *folio) +{ + struct dev_pagemap *pgmap = page_get_pgmap(&folio->page); + + if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->page_free)) + return; + + mem_cgroup_uncharge(folio); + + /* + * Note: we don't expect anonymous compound pages yet. Once supported + * and we could PTE-map them similar to THP, we'd have to clear + * PG_anon_exclusive on all tail pages. + */ + if (folio_test_anon(folio)) { + VM_BUG_ON_FOLIO(folio_test_large(folio), folio); + __ClearPageAnonExclusive(folio_page(folio, 0)); + } + + /* + * When a device managed page is freed, the folio->mapping field + * may still contain a (stale) mapping value. For example, the + * lower bits of folio->mapping may still identify the folio as an + * anonymous folio. Ultimately, this entire field is just stale + * and wrong, and it will cause errors if not cleared. + * + * For other types of ZONE_DEVICE pages, migration is either + * handled differently or not done at all, so there is no need + * to clear folio->mapping. + */ + folio->mapping = NULL; + pgmap->ops->page_free(folio_page(folio, 0)); + + if (pgmap->type != MEMORY_DEVICE_PRIVATE && + pgmap->type != MEMORY_DEVICE_COHERENT) + /* + * Reset the refcount to 1 to prepare for handing out the page + * again. + */ + folio_set_count(folio, 1); + else + put_dev_pagemap(pgmap); +} + static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, unsigned long zone_idx, int nid, struct dev_pagemap *pgmap) @@ -995,6 +1040,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, * and zone_device_data. It is a bug if a ZONE_DEVICE page is * ever freed or placed on a driver-private list. */ + pgmap->folio_ops.free = free_zone_device_folio; page_set_pgmap(page, pgmap); page->zone_device_data = NULL; diff --git a/mm/swap.c b/mm/swap.c index 767ff6d8f47b..d2578465e270 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -117,11 +117,6 @@ void __folio_put(struct folio *folio) return; } - if (unlikely(folio_is_zone_device(folio))) { - free_zone_device_folio(folio); - return; - } - if (folio_test_hugetlb(folio)) { free_huge_folio(folio); return; @@ -947,20 +942,11 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) unlock_page_lruvec_irqrestore(lruvec, flags); lruvec = NULL; } - if (folio_ref_sub_and_test(folio, nr_refs)) - owner_ops->free(folio); - continue; - } - - if (folio_is_zone_device(folio)) { - if (lruvec) { - unlock_page_lruvec_irqrestore(lruvec, flags); - lruvec = NULL; - } + /* fenced by folio_is_zone_device() */ if (put_devmap_managed_folio_refs(folio, nr_refs)) continue; if (folio_ref_sub_and_test(folio, nr_refs)) - free_zone_device_folio(folio); + owner_ops->free(folio); continue; } -- 2.47.0.277.g8800431eea-goog
Fuad Tabba
2024-Nov-08 16:20 UTC
[RFC PATCH v1 10/10] mm: hugetlb: Use owner_ops on folio_put for hugetlb
Now that we have the folio_owner_ops callback, use it for hugetlb pages instead of using a dedicated callback. Since owner_ops is overlaid with lru, we need to unset owner_ops to allow the use of lru when its isolated. At that point we know that the reference count is elevated, will not reach 0, and thus not trigger a callback. Therefore, it is safe to do so provided we restore it before we put the folio back. Signed-off-by: Fuad Tabba <tabba at google.com> --- include/linux/hugetlb.h | 2 -- mm/hugetlb.c | 57 +++++++++++++++++++++++++++++++++-------- mm/swap.c | 14 ---------- 3 files changed, 47 insertions(+), 26 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e846d7dac77c..500848862702 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -20,8 +20,6 @@ struct user_struct; struct mmu_gather; struct node; -void free_huge_folio(struct folio *folio); - #ifdef CONFIG_HUGETLB_PAGE #include <linux/pagemap.h> diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2308e94d8615..4e1c87e37968 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -89,6 +89,33 @@ static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static void hugetlb_unshare_pmds(struct vm_area_struct *vma, unsigned long start, unsigned long end); static struct resv_map *vma_resv_map(struct vm_area_struct *vma); +static void free_huge_folio(struct folio *folio); + +static const struct folio_owner_ops hugetlb_owner_ops = { + .free = free_huge_folio, +}; + +/* + * Mark this folio as a hugetlb-owned folio. + * + * Set the folio hugetlb flag and owner operations. + */ +static void folio_set_hugetlb_owner(struct folio *folio) +{ + __folio_set_hugetlb(folio); + folio_set_owner_ops(folio, &hugetlb_owner_ops); +} + +/* + * Unmark this folio from being a hugetlb-owned folio. + * + * Clear the folio hugetlb flag and owner operations. + */ +static void folio_clear_hugetlb_owner(struct folio *folio) +{ + folio_clear_owner_ops(folio); + __folio_clear_hugetlb(folio); +} static void hugetlb_free_folio(struct folio *folio) { @@ -1617,7 +1644,7 @@ static void remove_hugetlb_folio(struct hstate *h, struct folio *folio, * to tail struct pages. */ if (!folio_test_hugetlb_vmemmap_optimized(folio)) { - __folio_clear_hugetlb(folio); + folio_clear_hugetlb_owner(folio); } h->nr_huge_pages--; @@ -1641,7 +1668,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio, h->surplus_huge_pages++; h->surplus_huge_pages_node[nid]++; } - __folio_set_hugetlb(folio); + folio_set_hugetlb_owner(folio); folio_change_private(folio, NULL); /* @@ -1692,7 +1719,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, */ if (folio_test_hugetlb(folio)) { spin_lock_irq(&hugetlb_lock); - __folio_clear_hugetlb(folio); + folio_clear_hugetlb_owner(folio); spin_unlock_irq(&hugetlb_lock); } @@ -1793,7 +1820,7 @@ static void bulk_vmemmap_restore_error(struct hstate *h, list_for_each_entry_safe(folio, t_folio, non_hvo_folios, _hugetlb_list) { list_del(&folio->_hugetlb_list); spin_lock_irq(&hugetlb_lock); - __folio_clear_hugetlb(folio); + folio_clear_hugetlb_owner(folio); spin_unlock_irq(&hugetlb_lock); update_and_free_hugetlb_folio(h, folio, false); cond_resched(); @@ -1818,7 +1845,7 @@ static void bulk_vmemmap_restore_error(struct hstate *h, } else { list_del(&folio->_hugetlb_list); spin_lock_irq(&hugetlb_lock); - __folio_clear_hugetlb(folio); + folio_clear_hugetlb_owner(folio); spin_unlock_irq(&hugetlb_lock); update_and_free_hugetlb_folio(h, folio, false); cond_resched(); @@ -1851,14 +1878,14 @@ static void update_and_free_pages_bulk(struct hstate *h, * should only be pages on the non_hvo_folios list. * Do note that the non_hvo_folios list could be empty. * Without HVO enabled, ret will be 0 and there is no need to call - * __folio_clear_hugetlb as this was done previously. + * folio_clear_hugetlb_owner as this was done previously. */ VM_WARN_ON(!list_empty(folio_list)); VM_WARN_ON(ret < 0); if (!list_empty(&non_hvo_folios) && ret) { spin_lock_irq(&hugetlb_lock); list_for_each_entry(folio, &non_hvo_folios, _hugetlb_list) - __folio_clear_hugetlb(folio); + folio_clear_hugetlb_owner(folio); spin_unlock_irq(&hugetlb_lock); } @@ -1879,7 +1906,7 @@ struct hstate *size_to_hstate(unsigned long size) return NULL; } -void free_huge_folio(struct folio *folio) +static void free_huge_folio(struct folio *folio) { /* * Can't pass hstate in here because it is called from the @@ -1959,7 +1986,7 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid) static void init_new_hugetlb_folio(struct hstate *h, struct folio *folio) { - __folio_set_hugetlb(folio); + folio_set_hugetlb_owner(folio); INIT_LIST_HEAD(&folio->_hugetlb_list); hugetlb_set_folio_subpool(folio, NULL); set_hugetlb_cgroup(folio, NULL); @@ -7428,6 +7455,14 @@ bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list) goto unlock; } folio_clear_hugetlb_migratable(folio); + /* + * Clear folio->owner_ops; now we can use folio->lru. + * Note that the folio cannot get freed because we are holding a + * reference. The reference will be put in folio_putback_hugetlb(), + * after restoring folio->owner_ops. + */ + folio_clear_owner_ops(folio); + INIT_LIST_HEAD(&folio->lru); list_del_init(&folio->_hugetlb_list); list_add_tail(&folio->lru, list); unlock: @@ -7480,7 +7515,9 @@ void folio_putback_hugetlb(struct folio *folio) { spin_lock_irq(&hugetlb_lock); folio_set_hugetlb_migratable(folio); - list_del_init(&folio->lru); + list_del(&folio->lru); + /* Restore folio->owner_ops since we can no longer use folio->lru. */ + folio_set_owner_ops(folio, &hugetlb_owner_ops); list_add_tail(&folio->_hugetlb_list, &(folio_hstate(folio))->hugepage_activelist); spin_unlock_irq(&hugetlb_lock); folio_put(folio); diff --git a/mm/swap.c b/mm/swap.c index d2578465e270..9798ca47f26a 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -117,11 +117,6 @@ void __folio_put(struct folio *folio) return; } - if (folio_test_hugetlb(folio)) { - free_huge_folio(folio); - return; - } - page_cache_release(folio); folio_unqueue_deferred_split(folio); mem_cgroup_uncharge(folio); @@ -953,15 +948,6 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) if (!folio_ref_sub_and_test(folio, nr_refs)) continue; - /* hugetlb has its own memcg */ - if (folio_test_hugetlb(folio)) { - if (lruvec) { - unlock_page_lruvec_irqrestore(lruvec, flags); - lruvec = NULL; - } - free_huge_folio(folio); - continue; - } folio_unqueue_deferred_split(folio); __page_cache_release(folio, &lruvec, &flags); -- 2.47.0.277.g8800431eea-goog
Jason Gunthorpe
2024-Nov-08 17:05 UTC
[RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops
On Fri, Nov 08, 2024 at 04:20:30PM +0000, Fuad Tabba wrote:> Some folios, such as hugetlb folios and zone device folios, > require special handling when the folio's reference count reaches > 0, before being freed. Moreover, guest_memfd folios will likely > require special handling to notify it once a folio's reference > count reaches 0, to facilitate shared to private folio conversion > [*]. Currently, each usecase has a dedicated callback when the > folio refcount reaches 0 to that effect. Adding yet more > callbacks is not ideal.Honestly, I question this thesis. How complex would it be to have 'yet more callbacks'? Is the challenge really that the mm can't detect when guestmemfd is the owner of the page because the page will be ZONE_NORMAL? So the point of this is really to allow ZONE_NORMAL pages to have a per-allocator callback? But this is also why I suggested to shift them to ZONE_DEVICE for guestmemfd, because then you get these things for free from the pgmap. (this is not a disagreement this is a valid solution, but a request you explain much more about what it is you actually need and compare it with the other existing options) Jason