David Hildenbrand
2024-Nov-12 14:22 UTC
[RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops
On 12.11.24 14:53, Jason Gunthorpe wrote:> On Tue, Nov 12, 2024 at 10:10:06AM +0100, David Hildenbrand wrote: >> On 12.11.24 06:26, Matthew Wilcox wrote: >>> On Mon, Nov 11, 2024 at 08:26:54AM +0000, Fuad Tabba wrote: >>>> Thanks for your comments Jason, and for clarifying my cover letter >>>> David. I think David has covered everything, and I'll make sure to >>>> clarify this in the cover letter when I respin. >>> >>> I don't want you to respin. I think this is a bad idea. >> >> I'm hoping you'll find some more time to explain what exactly you don't >> like, because this series only refactors what we already have. >> >> I enjoy seeing the special casing (especially hugetlb) gone from mm/swap.c. >> >> I don't particularly enjoy overlaying folio->lru, primarily because we have >> to temporarily "evacuate" it when someone wants to make use of folio->lru >> (e.g., hugetlb isolation). So it's not completely "sticky", at least for >> hugetlb. > > This is really the worst part of it thoughYes.> > And, IMHO, seems like overkill. We have only a handful of cases - > maybe we shouldn't be trying to get to full generality but just handle > a couple of cases directly? I don't really think it is such a bad > thing to have an if ladder on the free path if we have only a couple > things. Certainly it looks good instead of doing overlaying tricks.I'd really like to abstract hugetlb handling if possible. The way it stands it's just very odd. We'll need some reliable way to identify these folios that need care. guest_memfd will be using folio->mapcount for now, so for now we couldn't set a page type like hugetlb does.> Also how does this translate to Matthew's memdesc world?guest_memfd and hugetlb would be operating on folios (at least for now), which contain the refcount,lru,private, ... so nothing special there. Once we actually decoupled "struct folio" from "struct page", we *might* have to play less tricks, because we could just have a callback pointer there. But well, maybe we also want to save some space in there. Do we want dedicated memdescs for hugetlb/guest_memfd that extend folios in the future? I don't know, maybe. I'm currently wondering if we can use folio->private for the time being. Either (a) If folio->private is still set once the refcount drops to 0, it indicates that there is a freeing callback/owner_ops. We'll have to make hugetlb not use folio->private and convert others to clear folio->private before freeing. (b) Use bitX of folio->private to indicate that this has "owner_ops" meaning. We'll have to make hugetlb not use folio->private and make others not use bitX. Might be harder and overkill, because right now we only really need the callback when refcount==0. (c) Use some other indication that folio->private contains folio_ops. -- Cheers, David / dhildenb
Matthew Wilcox
2024-Nov-13 04:57 UTC
[RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops
On Tue, Nov 12, 2024 at 03:22:46PM +0100, David Hildenbrand wrote:> On 12.11.24 14:53, Jason Gunthorpe wrote: > > On Tue, Nov 12, 2024 at 10:10:06AM +0100, David Hildenbrand wrote: > > > On 12.11.24 06:26, Matthew Wilcox wrote: > > > > I don't want you to respin. I think this is a bad idea. > > > > > > I'm hoping you'll find some more time to explain what exactly you don't > > > like, because this series only refactors what we already have. > > > > > > I enjoy seeing the special casing (especially hugetlb) gone from mm/swap.c.I don't. The list of 'if's is better than the indirect function call. That's terribly expensive, and the way we reuse the lru.next field is fragile. Not to mention that it introduces a new thing for the hardening people to fret over.> > And, IMHO, seems like overkill. We have only a handful of cases - > > maybe we shouldn't be trying to get to full generality but just handle > > a couple of cases directly? I don't really think it is such a bad > > thing to have an if ladder on the free path if we have only a couple > > things. Certainly it looks good instead of doing overlaying tricks. > > I'd really like to abstract hugetlb handling if possible. The way it stands > it's just very odd.There might be ways to make that better. I haven't really been looking too hard at making that special handling go away.> We'll need some reliable way to identify these folios that need care. > guest_memfd will be using folio->mapcount for now, so for now we couldn't > set a page type like hugetlb does.If hugetlb can set lru.next at a certain point, then guestmemfd could set a page type at a similar point, no?> > Also how does this translate to Matthew's memdesc world?In a memdesc world, pages no longer have a refcount. We might still have put_page() which will now be a very complicated (and out-of-line) function that looks up what kind of memdesc it is and operates on the memdesc's refcount ... if it has one. I don't know if it'll be exported to modules; I can see uses in the mm code, but I'm not sure if modules will have a need. Each memdesc type will have its own function to call to free the memdesc. So we'll still have folio_put(). But slab does not have, need nor want a refcount, so it'll just slab_free(). I expect us to keep around a list of recently-freed memdescs of a particular type with their pages still attached so that we can allocate them again quickly (or reclaim them under memory pressure). Once that freelist overflows, we'll free a batch of them to the buddy allocator (for the pages) and the slab allocator (for the memdesc itself).> guest_memfd and hugetlb would be operating on folios (at least for now), > which contain the refcount,lru,private, ... so nothing special there. > > Once we actually decoupled "struct folio" from "struct page", we *might* > have to play less tricks, because we could just have a callback pointer > there. But well, maybe we also want to save some space in there. > > Do we want dedicated memdescs for hugetlb/guest_memfd that extend folios in > the future? I don't know, maybe.I've certainly considered going so far as a per-fs folio. So we'd have an ext4_folio, an btrfs_folio, an iomap_folio, etc. That'd let us get rid of folio->private, but I'm not sure that C's type system can really handle this nicely. Maybe in a Rust world ;-) What I'm thinking about is that I'd really like to be able to declare that all the functions in ext4_aops only accept pointers to ext4_folio, so ext4_dirty_folio() can't be called with pointers to _any_ folio, but specifically folios which were previously allocated for ext4. I don't know if Rust lets you do something like that.> I'm currently wondering if we can use folio->private for the time being. > Either > > (a) If folio->private is still set once the refcount drops to 0, it > indicates that there is a freeing callback/owner_ops. We'll have to make > hugetlb not use folio->private and convert others to clear folio->private > before freeing. > > (b) Use bitX of folio->private to indicate that this has "owner_ops" > meaning. We'll have to make hugetlb not use folio->private and make others > not use bitX. Might be harder and overkill, because right now we only really > need the callback when refcount==0. > > (c) Use some other indication that folio->private contains folio_ops.I really don't want to use folio_ops / folio_owner_ops. I read https://lore.kernel.org/all/CAGtprH_JP2w-4rq02h_Ugvq5KuHX7TUvegOS7xUs_iy5hriE7g at mail.gmail.com/ and I still don't understand what you're trying to do. Would it work to use aops->free_folio() to notify you when the folio is being removed from the address space?
Possibly Parallel Threads
- [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops
- [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops
- [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops
- [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops
- [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops