Jiang Liu
2013-Mar-05 14:54 UTC
[RFC PATCH v1 00/33] accurately calculate pages managed by buddy system
The original goal of this patchset is to fix the bug reported by
https://bugzilla.kernel.org/show_bug.cgi?id=53501
Now it has also been expanded to reduce common code used by memory
initializion. In total it has reduced about 550 lines of code.
Patch 1:
Extract common help functions from free_init_mem() and
free_initrd_mem() on different architectures.
Patch 2-27:
Use help functions to simplify free_init_mem() and
free_initrd_mem() on different architectures. This has reduced
about 500 lines of code.
Patch 28:
Introduce common help function to free highmem pages when
initializing memory subsystem.
Patch 29-32:
Adjust totalhigh_pages, totalram_pages and zone->managed_pages
altogether when reserving/unreserving pages.
Patch 33:
Change /sys/.../node/nodex/meminfo to report available pages
within the node as "MemTotal".
We have only tested these patchset on x86 platforms, and have done basic
compliation tests using cross-compilers from ftp.kernel.org. That means
some code may not pass compilation on some architectures. So any help
to test this patchset are welcomed!
Jiang Liu (33):
mm: introduce common help functions to deal with reserved/managed
pages
mm/alpha: use common help functions to free reserved pages
mm/ARM: use common help functions to free reserved pages
mm/avr32: use common help functions to free reserved pages
mm/blackfin: use common help functions to free reserved pages
mm/c6x: use common help functions to free reserved pages
mm/cris: use common help functions to free reserved pages
mm/FRV: use common help functions to free reserved pages
mm/h8300: use common help functions to free reserved pages
mm/IA64: use common help functions to free reserved pages
mm/m32r: use common help functions to free reserved pages
mm/m68k: use common help functions to free reserved pages
mm/microblaze: use common help functions to free reserved pages
mm/MIPS: use common help functions to free reserved pages
mm/mn10300: use common help functions to free reserved pages
mm/openrisc: use common help functions to free reserved pages
mm/parisc: use common help functions to free reserved pages
mm/ppc: use common help functions to free reserved pages
mm/s390: use common help functions to free reserved pages
mm/score: use common help functions to free reserved pages
mm/SH: use common help functions to free reserved pages
mm/SPARC: use common help functions to free reserved pages
mm/um: use common help functions to free reserved pages
mm/unicore32: use common help functions to free reserved pages
mm/x86: use common help functions to free reserved pages
mm/xtensa: use common help functions to free reserved pages
mm,kexec: use common help functions to free reserved pages
mm: introduce free_highmem_page() helper to free highmem pages inti
buddy system
mm: accurately calculate zone->managed_pages for highmem zones
mm: use a dedicated lock to protect totalram_pages and
zone->managed_pages
mm: avoid using __free_pages_bootmem() at runtime
mm: correctly update zone->mamaged_pages
mm: report available pages as "MemTotal" for each NUMA node
arch/alpha/kernel/sys_nautilus.c | 5 +-
arch/alpha/mm/init.c | 24 ++-------
arch/alpha/mm/numa.c | 3 +-
arch/arm/mm/init.c | 46 ++++++-----------
arch/arm64/mm/init.c | 26 +---------
arch/avr32/mm/init.c | 24 +--------
arch/blackfin/mm/init.c | 20 +-------
arch/c6x/mm/init.c | 30 +----------
arch/cris/mm/init.c | 16 +-----
arch/frv/mm/init.c | 32 ++----------
arch/h8300/mm/init.c | 28 +----------
arch/ia64/mm/init.c | 23 ++-------
arch/m32r/mm/init.c | 26 ++--------
arch/m68k/mm/init.c | 24 +--------
arch/microblaze/include/asm/setup.h | 1 -
arch/microblaze/mm/init.c | 33 ++----------
arch/mips/mm/init.c | 36 ++++----------
arch/mips/sgi-ip27/ip27-memory.c | 4 +-
arch/mn10300/mm/init.c | 23 +--------
arch/openrisc/mm/init.c | 27 ++--------
arch/parisc/mm/init.c | 24 ++-------
arch/powerpc/kernel/crash_dump.c | 5 +-
arch/powerpc/kernel/fadump.c | 5 +-
arch/powerpc/kernel/kvm.c | 7 +--
arch/powerpc/mm/mem.c | 34 ++-----------
arch/powerpc/platforms/512x/mpc512x_shared.c | 5 +-
arch/s390/mm/init.c | 35 +++----------
arch/score/mm/init.c | 33 ++----------
arch/sh/mm/init.c | 26 ++--------
arch/sparc/kernel/leon_smp.c | 15 ++----
arch/sparc/mm/init_32.c | 50 +++----------------
arch/sparc/mm/init_64.c | 25 ++--------
arch/tile/mm/init.c | 4 +-
arch/um/kernel/mem.c | 25 ++--------
arch/unicore32/mm/init.c | 26 +---------
arch/x86/mm/init.c | 5 +-
arch/x86/mm/init_32.c | 10 +---
arch/x86/mm/init_64.c | 18 +------
arch/xtensa/mm/init.c | 21 ++------
drivers/virtio/virtio_balloon.c | 8 +--
drivers/xen/balloon.c | 19 ++-----
include/linux/mm.h | 36 ++++++++++++++
include/linux/mmzone.h | 14 ++++--
kernel/kexec.c | 8 +--
mm/bootmem.c | 16 ++----
mm/hugetlb.c | 2 +-
mm/memory_hotplug.c | 31 ++----------
mm/nobootmem.c | 14 ++----
mm/page_alloc.c | 69 ++++++++++++++++++++++----
49 files changed, 248 insertions(+), 793 deletions(-)
--
1.7.9.5
Jiang Liu
2013-Mar-05 14:54 UTC
[RFC PATCH v1 01/33] mm: introduce common help functions to deal with reserved/managed pages
Code to deal with reserved/managed pages are duplicated by many
architectures, so introduce common help functions to reduce duplicated
code. These common help functions will also be used to concentrate code
to modify totalram_pages and zone->managed_pages, which makes the code
much more clear.
Signed-off-by: Jiang Liu <jiang.liu at huawei.com>
---
include/linux/mm.h | 37 +++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 20 ++++++++++++++++++++
2 files changed, 57 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7acc9dc..881461c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1295,6 +1295,43 @@ extern void free_area_init_node(int nid, unsigned long *
zones_size,
unsigned long zone_start_pfn, unsigned long *zholes_size);
extern void free_initmem(void);
+/* Help functions to deal with reserved/managed pages. */
+extern unsigned long free_reserved_area(unsigned long start, unsigned long end,
+ int poison, char *s);
+
+static inline void adjust_managed_page_count(struct page *page, long count)
+{
+ totalram_pages += count;
+}
+
+static inline void __free_reserved_page(struct page *page)
+{
+ ClearPageReserved(page);
+ init_page_count(page);
+ __free_page(page);
+}
+
+static inline void free_reserved_page(struct page *page)
+{
+ __free_reserved_page(page);
+ adjust_managed_page_count(page, 1);
+}
+
+static inline void mark_page_reserved(struct page *page)
+{
+ SetPageReserved(page);
+ adjust_managed_page_count(page, -1);
+}
+
+static inline void free_initmem_default(int poison)
+{
+ extern char __init_begin[], __init_end[];
+
+ free_reserved_area(PAGE_ALIGN((unsigned long)&__init_begin) ,
+ ((unsigned long)&__init_end) & PAGE_MASK,
+ poison, "unused kernel");
+}
+
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
/*
* With CONFIG_HAVE_MEMBLOCK_NODE_MAP set, an architecture may initialise its
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fcced7..0fadb09 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5113,6 +5113,26 @@ early_param("movablecore",
cmdline_parse_movablecore);
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+unsigned long free_reserved_area(unsigned long start, unsigned long end,
+ int poison, char *s)
+{
+ unsigned long pages, pos;
+
+ pos = start = PAGE_ALIGN(start);
+ end &= PAGE_MASK;
+ for (pages = 0; pos < end; pos += PAGE_SIZE, pages++) {
+ if (poison)
+ memset((void *)pos, poison, PAGE_SIZE);
+ free_reserved_page(virt_to_page(pos));
+ }
+
+ if (pages && s)
+ pr_info("Freeing %s memory: %ldK (%lx - %lx)\n",
+ s, pages << (PAGE_SHIFT - 10), start, end);
+
+ return pages;
+}
+
/**
* set_dma_reserve - set the specified number of pages reserved in the first
zone
* @new_dma_reserve: The number of pages to mark reserved
--
1.7.9.5
Jiang Liu
2013-Mar-05 14:55 UTC
[RFC PATCH v1 32/33] mm: correctly update zone->mamaged_pages
Enhance adjust_managed_page_count() to adjust totalhigh_pages for
highmem pages. And change code which directly adjusts totalram_pages
to use adjust_managed_page_count() because it adjusts totalram_pages,
totalhigh_pages and zone->managed_pages altogether in a safe way.
Remove inc_totalhigh_pages() and dec_totalhigh_pages() from xen/balloon
driver bacause adjust_managed_page_count() has already adjusted
totalhigh_pages. This patch also enhances virtio_balloon driver to
adjust totalhigh_pages when reserve/unreserve pages.
Signed-off-by: Jiang Liu <jiang.liu at huawei.com>
Cc: Chris Metcalf <cmetcalf at tilera.com>
Cc: Rusty Russell <rusty at rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst at redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Cc: Jeremy Fitzhardinge <jeremy at goop.org>
Cc: Wen Congyang <wency at cn.fujitsu.com>
Cc: Andrew Morton <akpm at linux-foundation.org>
Cc: Tang Chen <tangchen at cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki at jp.fujitsu.com>
Cc: Mel Gorman <mgorman at suse.de>
Cc: Minchan Kim <minchan at kernel.org>
Cc: linux-kernel at vger.kernel.org
Cc: virtualization at lists.linux-foundation.org
Cc: xen-devel at lists.xensource.com
Cc: linux-mm at kvack.org
---
arch/tile/mm/init.c | 4 ++--
drivers/virtio/virtio_balloon.c | 8 +++++---
drivers/xen/balloon.c | 19 ++++---------------
mm/hugetlb.c | 2 +-
mm/memory_hotplug.c | 15 +++------------
mm/page_alloc.c | 6 ++++++
6 files changed, 21 insertions(+), 33 deletions(-)
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index 2749515..5886aef 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -720,7 +720,7 @@ static void __init init_free_pfn_range(unsigned long start,
unsigned long end)
}
init_page_count(page);
__free_pages(page, order);
- totalram_pages += count;
+ adjust_managed_page_count(page, count);
page += count;
pfn += count;
@@ -1033,7 +1033,7 @@ static void free_init_pages(char *what, unsigned long
begin, unsigned long end)
pfn_pte(pfn, PAGE_KERNEL));
memset((void *)addr, POISON_FREE_INITMEM, PAGE_SIZE);
free_page(addr);
- totalram_pages++;
+ adjust_managed_page_count(page, 1);
}
pr_info("Freeing %s: %ldk freed\n", what, (end - begin) >>
10);
}
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 8dab163..4c6ec53 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -148,7 +148,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t
num)
}
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
- totalram_pages--;
+ adjust_managed_page_count(page, -1);
}
/* Did we get any? */
@@ -160,11 +160,13 @@ static void fill_balloon(struct virtio_balloon *vb, size_t
num)
static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
{
unsigned int i;
+ struct page *page;
/* Find pfns pointing at start of each page, get pages and free them. */
for (i = 0; i < num; i += VIRTIO_BALLOON_PAGES_PER_PAGE) {
- balloon_page_free(balloon_pfn_to_page(pfns[i]));
- totalram_pages++;
+ page = balloon_pfn_to_page(pfns[i]);
+ balloon_page_free(page);
+ adjust_managed_page_count(page, 1);
}
}
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index a56776d..a5fdbcc 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -89,14 +89,6 @@ EXPORT_SYMBOL_GPL(balloon_stats);
/* We increase/decrease in batches which fit in a page */
static xen_pfn_t frame_list[PAGE_SIZE / sizeof(unsigned long)];
-#ifdef CONFIG_HIGHMEM
-#define inc_totalhigh_pages() (totalhigh_pages++)
-#define dec_totalhigh_pages() (totalhigh_pages--)
-#else
-#define inc_totalhigh_pages() do {} while (0)
-#define dec_totalhigh_pages() do {} while (0)
-#endif
-
/* List of ballooned pages, threaded through the mem_map array. */
static LIST_HEAD(ballooned_pages);
@@ -132,9 +124,7 @@ static void __balloon_append(struct page *page)
static void balloon_append(struct page *page)
{
__balloon_append(page);
- if (PageHighMem(page))
- dec_totalhigh_pages();
- totalram_pages--;
+ adjust_managed_page_count(page, -1);
}
/* balloon_retrieve: rescue a page from the balloon, if it is not empty. */
@@ -151,13 +141,12 @@ static struct page *balloon_retrieve(bool prefer_highmem)
page = list_entry(ballooned_pages.next, struct page, lru);
list_del(&page->lru);
- if (PageHighMem(page)) {
+ if (PageHighMem(page))
balloon_stats.balloon_high--;
- inc_totalhigh_pages();
- } else
+ else
balloon_stats.balloon_low--;
- totalram_pages++;
+ adjust_managed_page_count(page, 1);
return page;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0a0be33..a381818 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1246,7 +1246,7 @@ static void __init gather_bootmem_prealloc(void)
* side-effects, like CommitLimit going negative.
*/
if (h->order > (MAX_ORDER - 1))
- totalram_pages += 1 << h->order;
+ adjust_managed_page_count(page, 1 << h->order);
}
}
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index af9e87f..f9ce564 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -760,20 +760,13 @@ EXPORT_SYMBOL_GPL(__online_page_set_limits);
void __online_page_increment_counters(struct page *page)
{
- totalram_pages++;
-
-#ifdef CONFIG_HIGHMEM
- if (PageHighMem(page))
- totalhigh_pages++;
-#endif
+ adjust_managed_page_count(page, 1);
}
EXPORT_SYMBOL_GPL(__online_page_increment_counters);
void __online_page_free(struct page *page)
{
- ClearPageReserved(page);
- init_page_count(page);
- __free_page(page);
+ __free_reserved_page(page);
}
EXPORT_SYMBOL_GPL(__online_page_free);
@@ -970,7 +963,6 @@ int __ref online_pages(unsigned long pfn, unsigned long
nr_pages, int online_typ
return ret;
}
- zone->managed_pages += onlined_pages;
zone->present_pages += onlined_pages;
zone->zone_pgdat->node_present_pages += onlined_pages;
if (onlined_pages) {
@@ -1554,10 +1546,9 @@ repeat:
/* reset pagetype flags and makes migrate type to be MOVABLE */
undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
/* removal success */
- zone->managed_pages -= offlined_pages;
+ adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
zone->present_pages -= offlined_pages;
zone->zone_pgdat->node_present_pages -= offlined_pages;
- totalram_pages -= offlined_pages;
init_per_zone_wmark_min();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d252443..041eb92 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -780,7 +780,9 @@ void __init init_cma_reserved_pageblock(struct page *page)
#ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages += pageblock_nr_pages;
+ else
#endif
+ page_zone(page)->managed_pages += pageblock_nr_pages;
}
#endif
@@ -5119,6 +5121,10 @@ void adjust_managed_page_count(struct page *page, long
count)
page_zone(page)->managed_pages += count;
totalram_pages += count;
+#ifdef CONFIG_HIGHMEM
+ if (PageHighMem(page))
+ totalhigh_pages += count;
+#endif
if (lock)
spin_unlock(&managed_page_count_lock);
--
1.7.9.5
Sam Ravnborg
2013-Mar-05 19:47 UTC
[RFC PATCH v1 01/33] mm: introduce common help functions to deal with reserved/managed pages
On Tue, Mar 05, 2013 at 10:54:44PM +0800, Jiang Liu wrote:> Code to deal with reserved/managed pages are duplicated by many > architectures, so introduce common help functions to reduce duplicated > code. These common help functions will also be used to concentrate code > to modify totalram_pages and zone->managed_pages, which makes the code > much more clear. > > Signed-off-by: Jiang Liu <jiang.liu at huawei.com> > --- > include/linux/mm.h | 37 +++++++++++++++++++++++++++++++++++++ > mm/page_alloc.c | 20 ++++++++++++++++++++ > 2 files changed, 57 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 7acc9dc..881461c 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1295,6 +1295,43 @@ extern void free_area_init_node(int nid, unsigned long * zones_size, > unsigned long zone_start_pfn, unsigned long *zholes_size); > extern void free_initmem(void); > > +/* Help functions to deal with reserved/managed pages. */ > +extern unsigned long free_reserved_area(unsigned long start, unsigned long end, > + int poison, char *s); > + > +static inline void adjust_managed_page_count(struct page *page, long count) > +{ > + totalram_pages += count; > +}What is the purpose of the unused page argument?> + > +static inline void __free_reserved_page(struct page *page) > +{ > + ClearPageReserved(page); > + init_page_count(page); > + __free_page(page); > +}This method is useful for architectures which implment HIGHMEM, like 32 bit x86 and 32 bit sparc. This calls for a name without underscores.> + > +static inline void free_reserved_page(struct page *page) > +{ > + __free_reserved_page(page); > + adjust_managed_page_count(page, 1); > +} > + > +static inline void mark_page_reserved(struct page *page) > +{ > + SetPageReserved(page); > + adjust_managed_page_count(page, -1); > +} > + > +static inline void free_initmem_default(int poison) > +{Why request user to supply the poison argumet. If this is the default implmentation then use the default poison value too (POISON_FREE_INITMEM)> + extern char __init_begin[], __init_end[]; > + > + free_reserved_area(PAGE_ALIGN((unsigned long)&__init_begin) , > + ((unsigned long)&__init_end) & PAGE_MASK, > + poison, "unused kernel"); > +}Maybe it is just me how is not used to this area of the kernel. But a few comments that describe what the purpose is of each function would have helped me. Sam