David Hildenbrand
2021-Dec-10 18:36 UTC
[RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.
On 10.12.21 00:04, Zi Yan wrote:> From: Zi Yan <ziy at nvidia.com> > > Hi all,Hi, thanks for working on that!> > This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA > and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER > adjustable at boot time[1]. > > The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range() > isolates pageblocks to remove free memory from buddy allocator but isolating > only a subset of pageblocks within a page spanning across multiple pageblocks > causes free page accounting issues. Isolated page might not be put into the > right free list, since the code assumes the migratetype of the first pageblock > as the whole free page migratetype. This is based on the discussion at [2]. > > To remove the requirement, this patchset: > 1. still isolates pageblocks at MAX_ORDER - 1 granularity; > 2. but saves the pageblock migratetypes outside the specified range of > alloc_contig_range() and restores them after all pages within the range > become free after __alloc_contig_migrate_range(); > 3. splits free pages spanning multiple pageblocks at the beginning and the end > of the range and puts the split pages to the right migratetype free lists > based on the pageblock migratetypes; > 4. returns pages not in the range as it did before this patch. > > Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise > 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound) > to make sure all pageblocks belonging to a single page are isolated together > and later pageblocks outside the range need to have their migratetypes restored; > or 2) extra logic will need to be added during page free time to split a free > page with multi-migratetype pageblocks. > > Two optimizations might come later: > 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned > range during isolation to increase successful rate of alloc_contig_range().The issue with virtio-mem is that we'll need that as soon as we change the granularity to pageblocks, because otherwise, you can heavily degrade unplug reliably in sane setups: Previous: * Try unplug free 4M range (2 pageblocks): succeeds Now: * Try unplug 2M range (first pageblock): succeeds. * Try unplug next 2M range (second pageblock): fails because first contains unmovable allcoations. -- Thanks, David / dhildenb