Konrad Rzeszutek Wilk
2012-Apr-16 17:15 UTC
[PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
Changelog v5 [since v4]: - used populate_physmap, fixed bugs. [v2-v4: not posted] - reworked the code in setup.c to work properly. [v1: https://lkml.org/lkml/2012/3/30/492] - initial patchset These patches that did not change between the first posting and this one are: xen/p2m: Move code around to allow for better re-usage. xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument xen/p2m: Collapse early_alloc_p2m_middle redundant checks. xen/p2m: An early bootup variant of set_phys_to_machine so if you read them before - you can skip them. The purpose of these patches is three-fold: 1) Fix the case where dom0_mem=X is not used and the machine boots with less memory than it should. A subsequent ''xl mem-set 0 X'' is required to balloon up to the right amount. Here is an example of what it looks like with the older kernels and with a pvops (and the patches): 2.6.32 SLES: Memory: 7538688k/8079432k available (3971k kernel code, 8192k absent, 532300k reserved, 2491k data, 348k init) MemTotal: 8063140 kB MemFree: 7421504 kB DirectMap4k: 8071240 kB Domain-0 0 7873 4 r----- 20.3 3.3: Memory: 6486452k/9208688k available (5825k kernel code, 1136060k absent, 1586176k reserved, 2890k data, 692k init) MemTotal: 6716156 kB MemFree: 6365696 kB DirectMap4k: 8078192 kB Domain-0 0 6774 4 r----- 26.0 3.3+patches: Memory: 7621460k/9208688k available (5817k kernel code, 1136060k absent, 451168k reserved, 2899k data, 692k init) MemTotal: 7849924 kB MemFree: 7500748 kB DirectMap4k: 8078192 kB Domain-0 0 7883 4 r----- 11.9 This is implemented in: [PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries 2). Fix where dom0_mem=XXXG is used (so the ''max'' parameter is not involved). The bug is that we would ignore it: With dom0_mem=1G 2.6.32 SLES: Memory: 650240k/1056768k available (3971k kernel code, 8192k absent, 397852k reserved, 2491k data, 348k init) 3.3: Memory: 610884k/9435136k available (5817k kernel code, 1136060k absent, 7688192k reserved, 2899k data, 696k init) 3.3+patches: Memory: 724184k/1053064k available (5817k kernel code, 4552k absent, 324328k reserved, 2899k data, 696k init) This is implemented in: [PATCH 6/8] xen/setup: Work properly with ''dom0_mem=X'' or with not Note, that the combination of ''dom0_mem=XXXG,max:YYYG'' is perfectly fine, so there is no need to fix that: With dom0_mem=1G,max:3G 2.6.32 SLES: Memory: 650240k/1056768k available (3971k kernel code, 8192k absent, 397852k reserved, 2491k data, 348k init) 3.3: Memory: 690324k/4281724k available (5826k kernel code, 1136060k absent, 2455340k reserved, 2891k data, 692k init) 3.3+patches: Memory: 690960k/4281724k available (5824k kernel code, 1136060k absent, 2454704k reserved, 2893k data, 692k init) Konrad Rzeszutek Wilk (8): xen/p2m: Move code around to allow for better re-usage. xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument xen/p2m: Collapse early_alloc_p2m_middle redundant checks. xen/p2m: An early bootup variant of set_phys_to_machine xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 xen/setup: Work properly with ''dom0_mem=X'' or with not dom0_mem. xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM xen/setup: Combine the two hypercall functions - since they are quite similar. arch/x86/include/asm/xen/page.h | 1 + arch/x86/xen/p2m.c | 104 ++++++++++++++++----------- arch/x86/xen/setup.c | 152 +++++++++++++++++++++++++++++++++------ 3 files changed, 193 insertions(+), 64 deletions(-)
Konrad Rzeszutek Wilk
2012-Apr-16 17:15 UTC
[PATCH 1/8] xen/p2m: Move code around to allow for better re-usage.
We are going to be using the early_alloc_p2m (and early_alloc_p2m_middle) code in follow up patches which are not related to setting identity pages. Hence lets move the code out in its own function and rename them as appropiate. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/p2m.c | 62 ++++++++++++++++++++++++++++----------------------- 1 files changed, 34 insertions(+), 28 deletions(-) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 1b267e7..3cc3afe 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -499,7 +499,7 @@ static bool alloc_p2m(unsigned long pfn) return true; } -static bool __init __early_alloc_p2m(unsigned long pfn) +static bool __init early_alloc_p2m_middle(unsigned long pfn) { unsigned topidx, mididx, idx; @@ -541,6 +541,36 @@ static bool __init __early_alloc_p2m(unsigned long pfn) } return idx != 0; } + +static bool __init early_alloc_p2m(unsigned long pfn) +{ + unsigned topidx = p2m_top_index(pfn); + unsigned long *mid_mfn_p; + unsigned long **mid; + + mid = p2m_top[topidx]; + mid_mfn_p = p2m_top_mfn_p[topidx]; + if (mid == p2m_mid_missing) { + mid = extend_brk(PAGE_SIZE, PAGE_SIZE); + + p2m_mid_init(mid); + + p2m_top[topidx] = mid; + + BUG_ON(mid_mfn_p != p2m_mid_missing_mfn); + } + /* And the save/restore P2M tables.. */ + if (mid_mfn_p == p2m_mid_missing_mfn) { + mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE); + p2m_mid_mfn_init(mid_mfn_p); + + p2m_top_mfn_p[topidx] = mid_mfn_p; + p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p); + /* Note: we don''t set mid_mfn_p[midix] here, + * look in early_alloc_p2m_middle */ + } + return true; +} unsigned long __init set_phys_range_identity(unsigned long pfn_s, unsigned long pfn_e) { @@ -559,35 +589,11 @@ unsigned long __init set_phys_range_identity(unsigned long pfn_s, pfn < ALIGN(pfn_e, (P2M_MID_PER_PAGE * P2M_PER_PAGE)); pfn += P2M_MID_PER_PAGE * P2M_PER_PAGE) { - unsigned topidx = p2m_top_index(pfn); - unsigned long *mid_mfn_p; - unsigned long **mid; - - mid = p2m_top[topidx]; - mid_mfn_p = p2m_top_mfn_p[topidx]; - if (mid == p2m_mid_missing) { - mid = extend_brk(PAGE_SIZE, PAGE_SIZE); - - p2m_mid_init(mid); - - p2m_top[topidx] = mid; - - BUG_ON(mid_mfn_p != p2m_mid_missing_mfn); - } - /* And the save/restore P2M tables.. */ - if (mid_mfn_p == p2m_mid_missing_mfn) { - mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE); - p2m_mid_mfn_init(mid_mfn_p); - - p2m_top_mfn_p[topidx] = mid_mfn_p; - p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p); - /* Note: we don''t set mid_mfn_p[midix] here, - * look in __early_alloc_p2m */ - } + WARN_ON(!early_alloc_p2m(pfn)); } - __early_alloc_p2m(pfn_s); - __early_alloc_p2m(pfn_e); + early_alloc_p2m_middle(pfn_s); + early_alloc_p2m_middle(pfn_e); for (pfn = pfn_s; pfn < pfn_e; pfn++) if (!__set_phys_to_machine(pfn, IDENTITY_FRAME(pfn))) -- 1.7.7.5
Konrad Rzeszutek Wilk
2012-Apr-16 17:15 UTC
[PATCH 2/8] xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
For identity cases we want to call reserve_brk only on the boundary conditions of the middle P2M (so P2M[x][y][0] = extend_brk). This is to work around identify regions (PCI spaces, gaps in E820) which are not aligned on 2MB regions. However for the case were we want to allocate P2M middle leafs at the early bootup stage, irregardless of this alignment check we need some means of doing that. For that we provide the new argument. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/p2m.c | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 3cc3afe..8b3a395 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -499,7 +499,7 @@ static bool alloc_p2m(unsigned long pfn) return true; } -static bool __init early_alloc_p2m_middle(unsigned long pfn) +static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary) { unsigned topidx, mididx, idx; @@ -508,7 +508,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn) idx = p2m_index(pfn); /* Pfff.. No boundary cross-over, lets get out. */ - if (!idx) + if (!idx && check_boundary) return false; WARN(p2m_top[topidx][mididx] == p2m_identity, @@ -531,7 +531,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn) p2m_top[topidx][mididx] = p2m; /* For save/restore we need to MFN of the P2M saved */ - + mid_mfn_p = p2m_top_mfn_p[topidx]; WARN(mid_mfn_p[mididx] != virt_to_mfn(p2m_missing), "P2M_TOP_P[%d][%d] != MFN of p2m_missing!\n", @@ -592,8 +592,8 @@ unsigned long __init set_phys_range_identity(unsigned long pfn_s, WARN_ON(!early_alloc_p2m(pfn)); } - early_alloc_p2m_middle(pfn_s); - early_alloc_p2m_middle(pfn_e); + early_alloc_p2m_middle(pfn_s, true); + early_alloc_p2m_middle(pfn_e, true); for (pfn = pfn_s; pfn < pfn_e; pfn++) if (!__set_phys_to_machine(pfn, IDENTITY_FRAME(pfn))) -- 1.7.7.5
Konrad Rzeszutek Wilk
2012-Apr-16 17:15 UTC
[PATCH 3/8] xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
At the start of the function we were checking for idx != 0 and bailing out. And later calling extend_brk if idx != 0. That is unnecessary so remove that checks. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/p2m.c | 25 ++++++++++++------------- 1 files changed, 12 insertions(+), 13 deletions(-) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 8b3a395..952edef 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -502,6 +502,8 @@ static bool alloc_p2m(unsigned long pfn) static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary) { unsigned topidx, mididx, idx; + unsigned long *p2m; + unsigned long *mid_mfn_p; topidx = p2m_top_index(pfn); mididx = p2m_mid_index(pfn); @@ -522,24 +524,21 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary return false; /* Boundary cross-over for the edges: */ - if (idx) { - unsigned long *p2m = extend_brk(PAGE_SIZE, PAGE_SIZE); - unsigned long *mid_mfn_p; + p2m = extend_brk(PAGE_SIZE, PAGE_SIZE); - p2m_init(p2m); + p2m_init(p2m); - p2m_top[topidx][mididx] = p2m; + p2m_top[topidx][mididx] = p2m; - /* For save/restore we need to MFN of the P2M saved */ + /* For save/restore we need to MFN of the P2M saved */ - mid_mfn_p = p2m_top_mfn_p[topidx]; - WARN(mid_mfn_p[mididx] != virt_to_mfn(p2m_missing), - "P2M_TOP_P[%d][%d] != MFN of p2m_missing!\n", - topidx, mididx); - mid_mfn_p[mididx] = virt_to_mfn(p2m); + mid_mfn_p = p2m_top_mfn_p[topidx]; + WARN(mid_mfn_p[mididx] != virt_to_mfn(p2m_missing), + "P2M_TOP_P[%d][%d] != MFN of p2m_missing!\n", + topidx, mididx); + mid_mfn_p[mididx] = virt_to_mfn(p2m); - } - return idx != 0; + return true; } static bool __init early_alloc_p2m(unsigned long pfn) -- 1.7.7.5
Konrad Rzeszutek Wilk
2012-Apr-16 17:15 UTC
[PATCH 4/8] xen/p2m: An early bootup variant of set_phys_to_machine
During early bootup we can''t use alloc_page, so to allocate leaf pages in the P2M we need to use extend_brk. For that we are utilizing the early_alloc_p2m and early_alloc_p2m_middle functions to do the job for us. This function follows the same logic as set_phys_to_machine. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/include/asm/xen/page.h | 1 + arch/x86/xen/p2m.c | 15 +++++++++++++++ 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index c34f96c..93971e8 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -44,6 +44,7 @@ extern unsigned long machine_to_phys_nr; extern unsigned long get_phys_to_machine(unsigned long pfn); extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn); +extern bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn); extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn); extern unsigned long set_phys_range_identity(unsigned long pfn_s, unsigned long pfn_e); diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 952edef..ffd08c4 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -570,6 +570,21 @@ static bool __init early_alloc_p2m(unsigned long pfn) } return true; } +bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn) +{ + if (unlikely(!__set_phys_to_machine(pfn, mfn))) { + if (!early_alloc_p2m(pfn)) + return false; + + if (!early_alloc_p2m_middle(pfn, false /* boundary crossover OK!*/)) + return false; + + if (!__set_phys_to_machine(pfn, mfn)) + return false; + } + + return true; +} unsigned long __init set_phys_range_identity(unsigned long pfn_s, unsigned long pfn_e) { -- 1.7.7.5
Konrad Rzeszutek Wilk
2012-Apr-16 17:15 UTC
[PATCH 5/8] xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
Otherwise we can get these meaningless: Freeing bad80-badf4 pfn range: 0 pages freed We also can do this for the summary ones - no point of printing "Set 0 page(s) to 1-1 mapping" [v1: Extended to the summary printks] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/setup.c | 11 +++++++---- 1 files changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 1ba8dff..7b0ab77 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -114,8 +114,9 @@ static unsigned long __init xen_release_chunk(unsigned long start, len++; } } - printk(KERN_INFO "Freeing %lx-%lx pfn range: %lu pages freed\n", - start, end, len); + if (len) + printk(KERN_INFO "Freeing %lx-%lx pfn range: %lu pages freed\n", + start, end, len); return len; } @@ -162,8 +163,10 @@ static unsigned long __init xen_set_identity_and_release( } } - printk(KERN_INFO "Released %lu pages of unused memory\n", released); - printk(KERN_INFO "Set %ld page(s) to 1-1 mapping\n", identity); + if (released) + printk(KERN_INFO "Released %lu pages of unused memory\n", released); + if (identity) + printk(KERN_INFO "Set %ld page(s) to 1-1 mapping\n", identity); return released; } -- 1.7.7.5
Konrad Rzeszutek Wilk
2012-Apr-16 17:15 UTC
[PATCH 6/8] xen/setup: Work properly with ''dom0_mem=X'' or with not dom0_mem.
We ignored the X value and ended up populating up to max(MAX_DOMAIN_PAGES, last E820_RAM entry). This fixes it by figuring out how many RAM nr_pages the hypervisor wanted to provide to us and cap the populate hypercalls up to that. The end result is (on a 8GB box): dom0_mem=1G -Memory: 610884k/9435136k available (5817k kernel code, 1136060k absent, 7688192k reserved, 2899k data, 696k init) +Memory: 724184k/1053064k available (5817k kernel code, 4552k absent, 324328k reserved, 2899k data, 696k init) no dom0_mem -Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init) +Memory: 7621460k/9208688k available (5817k kernel code, 1136060k absent, 451168k reserved, 2899k data, 696k init) [v1: Details added] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/setup.c | 18 ++++++++++++++++++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 7b0ab77..0e82bf1 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -185,11 +185,29 @@ static unsigned long __init xen_get_max_pages(void) * the current maximum rather than the static maximum. In this * case the e820 map provided to us will cover the static * maximum region. + * + * The dom0_mem=min:X,max:Y tweaks options differently depending + * on the version, but in general this is what we get: + * | XENMEM_maximum_reser | nr_pages + * --------------++-----------------------+------------------- + * no dom0_mem | INT_MAX | max_phys_pfn + * =3G | INT_MAX | 786432 + * =max:3G | 786432 | 786432 + * =min:1G,max:3G| INT_MAX | max_phys_fn + * =1G,max:3G | INT_MAX | 262144 + * =min:1G,max:3G,2G | INT_MAX | max_phys_fn + * + * The =3G is often used and it lead to us initially setting + * 786432 and allowing dom0 to balloon up to the max_physical_pfn. + * This is at odd with the classic XenOClassic so lets emulate + * the classic behavior. */ if (xen_initial_domain()) { ret = HYPERVISOR_memory_op(XENMEM_maximum_reservation, &domid); if (ret > 0) max_pages = ret; + if (ret == -1UL) + max_pages = xen_start_info->nr_pages; } return min(max_pages, MAX_DOMAIN_PAGES); -- 1.7.7.5
Konrad Rzeszutek Wilk
2012-Apr-16 17:15 UTC
[PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
When the Xen hypervisor boots a PV kernel it hands it two pieces of information: nr_pages and a made up E820 entry. The nr_pages value defines the range from zero to nr_pages of PFNs which have a valid Machine Frame Number (MFN) underneath it. The E820 mirrors that (with the VGA hole): BIOS-provided physical RAM map: Xen: 0000000000000000 - 00000000000a0000 (usable) Xen: 00000000000a0000 - 0000000000100000 (reserved) Xen: 0000000000100000 - 0000000080800000 (usable) The fun comes when a PV guest that is run with a machine E820 - that can either be the initial domain or a PCI PV guest, where the E820 looks like the normal thing: BIOS-provided physical RAM map: Xen: 0000000000000000 - 000000000009e000 (usable) Xen: 000000000009ec00 - 0000000000100000 (reserved) Xen: 0000000000100000 - 0000000020000000 (usable) Xen: 0000000020000000 - 0000000020200000 (reserved) Xen: 0000000020200000 - 0000000040000000 (usable) Xen: 0000000040000000 - 0000000040200000 (reserved) Xen: 0000000040200000 - 00000000bad80000 (usable) Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS) .. With that overlaying the nr_pages directly on the E820 does not work as there are gaps and non-RAM regions that won''t be used by the memory allocator. The ''xen_release_chunk'' helps with that by punching holes in the P2M (PFN to MFN lookup tree) for those regions and tells us that: Freeing 20000-20200 pfn range: 512 pages freed Freeing 40000-40200 pfn range: 512 pages freed Freeing bad80-badf4 pfn range: 116 pages freed Freeing badf6-bae7f pfn range: 137 pages freed Freeing bb000-100000 pfn range: 282624 pages freed Released 283999 pages of unused memory Those 283999 pages are subtracted from the nr_pages and are returned to the hypervisor. The end result is that the initial domain boots with 1GB less memory as the nr_pages has been subtracted by the amount of pages residing within the PCI hole. It can balloon up to that if desired using ''xl mem-set 0 8092'', but the balloon driver is not always compiled in for the initial domain. This patch, implements the populate hypercall (XENMEM_populate_physmap) which increases the the domain with the same amount of pages that were released. The other solution (that did not work) was to transplant the MFN in the P2M tree - the ones that were going to be freed were put in the E820_RAM regions past the nr_pages. But the modifications to the M2P array (the other side of creating PTEs) were not carried away. As the hypervisor is the only one capable of modifying that and the only two hypercalls that would do this are: the update_va_mapping (which won''t work, as during initial bootup only PFNs up to nr_pages are mapped in the guest) or via the populate hypercall. The end result is that the kernel can now boot with the nr_pages without having to subtract the 283999 pages. On a 8GB machine, with various dom0_mem= parameters this is what we get: no dom0_mem -Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init) +Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init) dom0_mem=3G -Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init) +Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init) dom0_mem=max:3G -Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init) +Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init) And the ''xm list'' or ''xl list'' now reflect what the dom0_memargument is. [v2: Use populate hypercall] [v3: Remove debug printks] [v4: Simplify code] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/setup.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 112 insertions(+), 4 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 0e82bf1..b44d409 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -26,7 +26,6 @@ #include <xen/interface/memory.h> #include <xen/interface/physdev.h> #include <xen/features.h> - #include "xen-ops.h" #include "vdso.h" @@ -120,7 +119,105 @@ static unsigned long __init xen_release_chunk(unsigned long start, return len; } +static unsigned long __init xen_populate_physmap(unsigned long start, + unsigned long end) +{ + struct xen_memory_reservation reservation = { + .address_bits = 0, + .extent_order = 0, + .domid = DOMID_SELF + }; + unsigned long len = 0; + int ret; + + for (pfn = start; pfn < end; pfn++) { + unsigned long frame; + + /* Make sure pfn does not exists to start with */ + if (pfn_to_mfn(pfn) != INVALID_P2M_ENTRY) + continue; + frame = pfn; + set_xen_guest_handle(reservation.extent_start, &frame); + reservation.nr_extents = 1; + + ret = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); + WARN(ret != 1, "Failed to populate pfn %lx err=%d\n", pfn, ret); + if (ret == 1) { + if (!early_set_phys_to_machine(pfn, frame)) { + set_xen_guest_handle(reservation.extent_start, &frame); + reservation.nr_extents = 1; + ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, + &reservation); + break; + } + len++; + } else + break; + } + if (len) + printk(KERN_INFO "Populated %lx-%lx pfn range: %lu pages added\n", + start, end, len); + return len; +} +static unsigned long __init xen_populate_chunk( + const struct e820entry *list, size_t map_size, + unsigned long max_pfn, unsigned long *last_pfn, + unsigned long credits_left) +{ + const struct e820entry *entry; + unsigned int i; + unsigned long done = 0; + unsigned long dest_pfn; + + for (i = 0, entry = list; i < map_size; i++, entry++) { + unsigned long credits = credits_left; + unsigned long s_pfn; + unsigned long e_pfn; + unsigned long pfns; + long capacity; + + if (credits <= 0) + break; + + if (entry->type != E820_RAM) + continue; + + e_pfn = PFN_UP(entry->addr + entry->size); + + /* We only care about E820 after the xen_start_info->nr_pages */ + if (e_pfn <= max_pfn) + continue; + + s_pfn = PFN_DOWN(entry->addr); + /* If the E820 falls within the nr_pages, we want to start + * at the nr_pages PFN. + * If that would mean going past the E820 entry, skip it + */ + if (s_pfn <= max_pfn) { + capacity = e_pfn - max_pfn; + dest_pfn = max_pfn; + } else { + /* last_pfn MUST be within E820_RAM regions */ + if (*last_pfn && e_pfn >= *last_pfn) + s_pfn = *last_pfn; + capacity = e_pfn - s_pfn; + dest_pfn = s_pfn; + } + /* If we had filled this E820_RAM entry, go to the next one. */ + if (capacity <= 0) + continue; + + if (credits > capacity) + credits = capacity; + + pfns = xen_populate_physmap(dest_pfn, dest_pfn + credits); + done += pfns; + credits_left -= pfns; + *last_pfn = (dest_pfn + pfns); + } + return done; +} static unsigned long __init xen_set_identity_and_release( const struct e820entry *list, size_t map_size, unsigned long nr_pages) { @@ -143,7 +240,6 @@ static unsigned long __init xen_set_identity_and_release( */ for (i = 0, entry = list; i < map_size; i++, entry++) { phys_addr_t end = entry->addr + entry->size; - if (entry->type == E820_RAM || i == map_size - 1) { unsigned long start_pfn = PFN_DOWN(start); unsigned long end_pfn = PFN_UP(end); @@ -238,7 +334,9 @@ char * __init xen_memory_setup(void) int rc; struct xen_memory_map memmap; unsigned long max_pages; + unsigned long last_pfn = 0; unsigned long extra_pages = 0; + unsigned long populated; int i; int op; @@ -278,9 +376,20 @@ char * __init xen_memory_setup(void) */ xen_released_pages = xen_set_identity_and_release( map, memmap.nr_entries, max_pfn); - extra_pages += xen_released_pages; /* + * Populate back the non-RAM pages and E820 gaps that had been + * released. */ + populated = xen_populate_chunk(map, memmap.nr_entries, + max_pfn, &last_pfn, xen_released_pages); + + extra_pages += (xen_released_pages - populated); + + if (last_pfn > max_pfn) { + max_pfn = min(MAX_DOMAIN_PAGES, last_pfn); + mem_end = PFN_PHYS(max_pfn); + } + /* * Clamp the amount of extra memory to a EXTRA_MEM_RATIO * factor the base size. On non-highmem systems, the base * size is the full initial memory allocation; on highmem it @@ -293,7 +402,6 @@ char * __init xen_memory_setup(void) */ extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)), extra_pages); - i = 0; while (i < memmap.nr_entries) { u64 addr = map[i].addr; -- 1.7.7.5
Konrad Rzeszutek Wilk
2012-Apr-16 17:15 UTC
[PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar.
They use the same set of arguments, so it is just the matter of using the proper hypercall. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/setup.c | 81 ++++++++++++++++++------------------------------- 1 files changed, 30 insertions(+), 51 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index b44d409..506a3e6 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -83,8 +83,8 @@ static void __init xen_add_extra_mem(u64 start, u64 size) __set_phys_to_machine(pfn, INVALID_P2M_ENTRY); } -static unsigned long __init xen_release_chunk(unsigned long start, - unsigned long end) +static unsigned long __init xen_do_chunk(unsigned long start, + unsigned long end, bool release) { struct xen_memory_reservation reservation = { .address_bits = 0, @@ -95,60 +95,36 @@ static unsigned long __init xen_release_chunk(unsigned long start, unsigned long pfn; int ret; - for(pfn = start; pfn < end; pfn++) { - unsigned long mfn = pfn_to_mfn(pfn); - - /* Make sure pfn exists to start with */ - if (mfn == INVALID_P2M_ENTRY || mfn_to_pfn(mfn) != pfn) - continue; - - set_xen_guest_handle(reservation.extent_start, &mfn); - reservation.nr_extents = 1; - - ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, - &reservation); - WARN(ret != 1, "Failed to release pfn %lx err=%d\n", pfn, ret); - if (ret == 1) { - __set_phys_to_machine(pfn, INVALID_P2M_ENTRY); - len++; - } - } - if (len) - printk(KERN_INFO "Freeing %lx-%lx pfn range: %lu pages freed\n", - start, end, len); - - return len; -} -static unsigned long __init xen_populate_physmap(unsigned long start, - unsigned long end) -{ - struct xen_memory_reservation reservation = { - .address_bits = 0, - .extent_order = 0, - .domid = DOMID_SELF - }; - unsigned long len = 0; - int ret; - for (pfn = start; pfn < end; pfn++) { unsigned long frame; + unsigned long mfn = pfn_to_mfn(pfn); - /* Make sure pfn does not exists to start with */ - if (pfn_to_mfn(pfn) != INVALID_P2M_ENTRY) - continue; - - frame = pfn; + if (release) { + /* Make sure pfn exists to start with */ + if (mfn == INVALID_P2M_ENTRY || mfn_to_pfn(mfn) != pfn) + continue; + frame = mfn; + } else { + if (mfn != INVALID_P2M_ENTRY) + continue; + frame = pfn; + } set_xen_guest_handle(reservation.extent_start, &frame); reservation.nr_extents = 1; - ret = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); - WARN(ret != 1, "Failed to populate pfn %lx err=%d\n", pfn, ret); + ret = HYPERVISOR_memory_op(release ? XENMEM_decrease_reservation : XENMEM_populate_physmap, + &reservation); + WARN(ret != 1, "Failed to %s pfn %lx err=%d\n", + release ? "release" : "populate", pfn, ret); + if (ret == 1) { - if (!early_set_phys_to_machine(pfn, frame)) { + if (!early_set_phys_to_machine(pfn, release ? INVALID_P2M_ENTRY : frame)) { + if (release) + break; set_xen_guest_handle(reservation.extent_start, &frame); reservation.nr_extents = 1; ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, - &reservation); + &reservation); break; } len++; @@ -156,8 +132,11 @@ static unsigned long __init xen_populate_physmap(unsigned long start, break; } if (len) - printk(KERN_INFO "Populated %lx-%lx pfn range: %lu pages added\n", - start, end, len); + printk(KERN_INFO "%s %lx-%lx pfn range: %lu pages %s\n", + release ? "Freeing" : "Populating", + start, end, len, + release ? "freed" : "added"); + return len; } static unsigned long __init xen_populate_chunk( @@ -211,7 +190,7 @@ static unsigned long __init xen_populate_chunk( if (credits > capacity) credits = capacity; - pfns = xen_populate_physmap(dest_pfn, dest_pfn + credits); + pfns = xen_do_chunk(dest_pfn, dest_pfn + credits, false); done += pfns; credits_left -= pfns; *last_pfn = (dest_pfn + pfns); @@ -249,8 +228,8 @@ static unsigned long __init xen_set_identity_and_release( if (start_pfn < end_pfn) { if (start_pfn < nr_pages) - released += xen_release_chunk( - start_pfn, min(end_pfn, nr_pages)); + released += xen_do_chunk( + start_pfn, min(end_pfn, nr_pages), true); identity += set_phys_range_identity( start_pfn, end_pfn); -- 1.7.7.5
Konrad Rzeszutek Wilk
2012-May-01 16:37 UTC
Re: [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
On Mon, Apr 16, 2012 at 01:15:31PM -0400, Konrad Rzeszutek Wilk wrote:> Changelog v5 [since v4]: > - used populate_physmap, fixed bugs. > [v2-v4: not posted] > - reworked the code in setup.c to work properly. > [v1: https://lkml.org/lkml/2012/3/30/492] > - initial patchsetOne bug I found was that with ''dom0_mem=max:1G'' (with and without these patches) I would get a bunch of (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152 (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17) where the (0 of X), sometimes was 1, 2,3,4 or 17 -depending on the machine I ran on it. I figured it out that the difference was in the ACPI tables that are allocated - and that those regions - even though are returned back to the hypervisor, cannot be repopulated. I can''t find the actual exact piece of code in the hypervisor to pin-point and say "Aha". What I did was use the same metrix that the hypervisor uses to figure out whether to deny the guest ballooning up - checking the d->tot_pages against t->max_pages. For that the XENMEM_current_reservation is used. From e4568b678455f68d374277319fb5cc41f11b6c4f Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Date: Thu, 26 Apr 2012 22:11:08 -0400 Subject: [PATCH] xen/setup: Cap amount to populate based on current tot_pages count. Previous to this patch we would try to populate exactly up to xen_released_pages number (so the ones that we released), but that is incorrect as there are some pages that we thought were released but in actuality were shared. Depending on the platform it could be small amounts - 2 pages, but some had much higher - 17. As such, lets use the XENMEM_current_reservation to get the exact count of pages we are using, subtract it from nr_pages and use the lesser of this and xen_released_pages to populate back. This fixes errors such as: (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152 (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17) Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/setup.c | 16 ++++++++++++++-- 1 files changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 506a3e6..8e7dcfd 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -287,7 +287,15 @@ static unsigned long __init xen_get_max_pages(void) return min(max_pages, MAX_DOMAIN_PAGES); } - +static unsigned long xen_get_current_pages(void) +{ + domid_t domid = DOMID_SELF; + int ret; + ret = HYPERVISOR_memory_op(XENMEM_current_reservation, &domid); + if (ret > 0) + return ret; + return 0; +} static void xen_align_and_add_e820_region(u64 start, u64 size, int type) { u64 end = start + size; @@ -358,7 +366,11 @@ char * __init xen_memory_setup(void) /* * Populate back the non-RAM pages and E820 gaps that had been - * released. */ + * released. But cap it as certain regions cannot be repopulated. + */ + if (xen_get_current_pages()) + xen_released_pages = min(max_pfn - xen_get_current_pages(), + xen_released_pages); populated = xen_populate_chunk(map, memmap.nr_entries, max_pfn, &last_pfn, xen_released_pages); -- 1.7.7.5
Jan Beulich
2012-May-02 09:05 UTC
Re: [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
>>> On 01.05.12 at 18:37, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Mon, Apr 16, 2012 at 01:15:31PM -0400, Konrad Rzeszutek Wilk wrote: >> Changelog v5 [since v4]: >> - used populate_physmap, fixed bugs. >> [v2-v4: not posted] >> - reworked the code in setup.c to work properly. >> [v1: https://lkml.org/lkml/2012/3/30/492] >> - initial patchset > > One bug I found was that with ''dom0_mem=max:1G'' (with and without these > patches) I would get a bunch of > > (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152 > (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 > of 17) > > where the (0 of X), sometimes was 1, 2,3,4 or 17 -depending on the machine > I ran on it. I figured it out that the difference was in the ACPI tables > that are allocated - and that those regions - even though are returned > back to the hypervisor, cannot be repopulated. I can''t find the actual > exact piece of code in the hypervisor to pin-point and say "Aha".Are you sure about this? For one, given that you wrote that you saw this with as little as a single page off, I doubt there''s any half way modern system where ACPI tables all fit into a single page (2 would even seem to be a theoretical minimum, as there ought to be one NVS region along with the "normal" tables). Second, the ability to repopulate shouldn''t really depend on the nature of the corresponding machine pages. Jan
David Vrabel
2012-May-03 11:48 UTC
Re: [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
On 01/05/12 17:37, Konrad Rzeszutek Wilk wrote:> On Mon, Apr 16, 2012 at 01:15:31PM -0400, Konrad Rzeszutek Wilk wrote: >> Changelog v5 [since v4]: >> - used populate_physmap, fixed bugs. >> [v2-v4: not posted] >> - reworked the code in setup.c to work properly. >> [v1: https://lkml.org/lkml/2012/3/30/492] >> - initial patchset > > One bug I found was that with ''dom0_mem=max:1G'' (with and without these > patches) I would get a bunch of > > (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152 > (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17) > > where the (0 of X), sometimes was 1, 2,3,4 or 17 -depending on the machine > I ran on it. I figured it out that the difference was in the ACPI tables > that are allocated - and that those regions - even though are returned > back to the hypervisor, cannot be repopulated. I can''t find the actual > exact piece of code in the hypervisor to pin-point and say "Aha".It was tricky to track down what is going here but I think I see what''s happening. The problem pages (on the system I looked at) were located just before the ISA memory region (so PFN < a0) and so they are mapped in the bootstrap page tables and have an additional ref so are not immediately freed when the page is released. They do get freed later on, presumably when the page tables are swapped over. I think the mapping needs to be removed with HYPERVISOR_update_va_mapping() before releasing the page. This is already done for the ISA region in xen_ident_map_ISA(). I may be easier to avoid doing anything with the PFNs < 0x100 and take the minimal lose of memory. David
David Vrabel
2012-May-03 11:54 UTC
Re: [Xen-devel] [PATCH 6/8] xen/setup: Work properly with ''dom0_mem=X'' or with not dom0_mem.
On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:> We ignored the X value and ended up populating up to > max(MAX_DOMAIN_PAGES, last E820_RAM entry). > > This fixes it by figuring out how many RAM nr_pages the > hypervisor wanted to provide to us and cap the populate > hypercalls up to that.I don''t think we should change the behavior again, particularly as the dom0_mem documentation doesn''t describe this proposed behaviour. I would drop this patch. David
David Vrabel
2012-May-03 11:56 UTC
Re: [Xen-devel] [PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:> > > This patch, implements the populate hypercall (XENMEM_populate_physmap) > which increases the the domain with the same amount of pages that > were released.Provided the problem with the unreclaimable pages are resolved: Acked-by: David Vrabel <david.vrabel@citrix.com> David
David Vrabel
2012-May-03 11:58 UTC
Re: [Xen-devel] [PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar.
On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:> They use the same set of arguments, so it is just the matter > of using the proper hypercall. > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>Acked-by: David Vrabel <david.vrabel@citrix.com> But would it be better to fold this into the previous patch? David
David Vrabel
2012-May-03 11:59 UTC
Re: [PATCH 5/8] xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:> Otherwise we can get these meaningless: > Freeing bad80-badf4 pfn range: 0 pages freed > > We also can do this for the summary ones - no point of printing > "Set 0 page(s) to 1-1 mapping" > > [v1: Extended to the summary printks] > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>Acked-by: David Vrabel <david.vrabel@citrix.com> David> --- > arch/x86/xen/setup.c | 11 +++++++---- > 1 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index 1ba8dff..7b0ab77 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -114,8 +114,9 @@ static unsigned long __init xen_release_chunk(unsigned long start, > len++; > } > } > - printk(KERN_INFO "Freeing %lx-%lx pfn range: %lu pages freed\n", > - start, end, len); > + if (len) > + printk(KERN_INFO "Freeing %lx-%lx pfn range: %lu pages freed\n", > + start, end, len); > > return len; > } > @@ -162,8 +163,10 @@ static unsigned long __init xen_set_identity_and_release( > } > } > > - printk(KERN_INFO "Released %lu pages of unused memory\n", released); > - printk(KERN_INFO "Set %ld page(s) to 1-1 mapping\n", identity); > + if (released) > + printk(KERN_INFO "Released %lu pages of unused memory\n", released); > + if (identity) > + printk(KERN_INFO "Set %ld page(s) to 1-1 mapping\n", identity); > > return released; > }
David Vrabel
2012-May-03 15:15 UTC
Re: [Xen-devel] [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
On 03/05/12 12:48, David Vrabel wrote:> On 01/05/12 17:37, Konrad Rzeszutek Wilk wrote: >> On Mon, Apr 16, 2012 at 01:15:31PM -0400, Konrad Rzeszutek Wilk wrote: >>> Changelog v5 [since v4]: >>> - used populate_physmap, fixed bugs. >>> [v2-v4: not posted] >>> - reworked the code in setup.c to work properly. >>> [v1: https://lkml.org/lkml/2012/3/30/492] >>> - initial patchset >> >> One bug I found was that with ''dom0_mem=max:1G'' (with and without these >> patches) I would get a bunch of >> >> (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152 >> (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17) >> >> where the (0 of X), sometimes was 1, 2,3,4 or 17 -depending on the machine >> I ran on it. I figured it out that the difference was in the ACPI tables >> that are allocated - and that those regions - even though are returned >> back to the hypervisor, cannot be repopulated. I can''t find the actual >> exact piece of code in the hypervisor to pin-point and say "Aha". > > It was tricky to track down what is going here but I think I see what''s > happening. > > The problem pages (on the system I looked at) were located just before > the ISA memory region (so PFN < a0) and so they are mapped in the > bootstrap page tables and have an additional ref so are not immediately > freed when the page is released. They do get freed later on, presumably > when the page tables are swapped over.It''s not the bootstrap page tables but those constructed in xen_setup_kernel_pagetable() but this has the same effect.> I think the mapping needs to be removed with > HYPERVISOR_update_va_mapping() before releasing the page. This is > already done for the ISA region in xen_ident_map_ISA().And here''s a patch that does this. I''ve not given it a lot of testing. This is on top of your 8/8 patch and your "xen/setup: Cap amount to populate based on current tot_pages count." patch is no longer needed. David 8<--------------------- From 17900ce942ed34ccc85b8e6fbce392118d95d9d3 Mon Sep 17 00:00:00 2001 From: David Vrabel <david.vrabel@citrix.com> Date: Thu, 3 May 2012 15:57:00 +0100 Subject: [PATCH] xen: update VA mapping when releasing memory during setup In xen_memory_setup(), if a page that is being released has a VA mapping this must also be updated. Otherwise, the page will be not released completely -- it will still be referenced in Xen and won''t be freed util the mapping is removed and this prevents it from being reallocated at a different PFN. This was already being done for the ISA memory region in xen_ident_map_ISA() but on many systems this was omitting a few pages as many systems marked a few pages below the ISA memory region as reserved in the e820 map. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- arch/x86/xen/enlighten.c | 1 - arch/x86/xen/mmu.c | 23 ----------------------- arch/x86/xen/setup.c | 41 ++++++++++++++++++++++++++++++++++------- arch/x86/xen/xen-ops.h | 1 - 4 files changed, 34 insertions(+), 32 deletions(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index a8f8844..ff9a20a 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1306,7 +1306,6 @@ asmlinkage void __init xen_start_kernel(void) xen_raw_console_write("mapping kernel into physical memory\n"); pgd = xen_setup_kernel_pagetable(pgd, xen_start_info->nr_pages); - xen_ident_map_ISA(); /* Allocate and initialize top and mid mfn levels for p2m structure */ xen_build_mfn_list_list(); diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index b8e2794..b756d8c 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) #endif } -void __init xen_ident_map_ISA(void) -{ - unsigned long pa; - - /* - * If we''re dom0, then linear map the ISA machine addresses into - * the kernel''s address space. - */ - if (!xen_initial_domain()) - return; - - xen_raw_printk("Xen: setup ISA identity maps\n"); - - for (pa = ISA_START_ADDRESS; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) { - pte_t pte = mfn_pte(PFN_DOWN(pa), PAGE_KERNEL_IO); - - if (HYPERVISOR_update_va_mapping(PAGE_OFFSET + pa, pte, 0)) - BUG(); - } - - xen_flush_tlb(); -} - static void __init xen_post_allocator_init(void) { pv_mmu_ops.set_pte = xen_set_pte; diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 506a3e6..d5f8714 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -139,6 +139,13 @@ static unsigned long __init xen_do_chunk(unsigned long start, return len; } + +static unsigned long __init xen_release_chunk(unsigned long start, + unsigned long end) +{ + return xen_do_chunk(start, end, true); +} + static unsigned long __init xen_populate_chunk( const struct e820entry *list, size_t map_size, unsigned long max_pfn, unsigned long *last_pfn, @@ -197,6 +204,29 @@ static unsigned long __init xen_populate_chunk( } return done; } + +static void __init xen_set_identity_and_release_chunk( + unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages, + unsigned long *released, unsigned long *identity) +{ + unsigned long pfn; + + /* + * If the PFNs are currently mapped, the VA mapping also needs + * to be updated to be 1:1. + */ + for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) + (void)HYPERVISOR_update_va_mapping( + (unsigned long)__va(pfn << PAGE_SHIFT), + mfn_pte(pfn, PAGE_KERNEL_IO), 0); + + if (start_pfn < nr_pages) + *released += xen_release_chunk( + start_pfn, min(end_pfn, nr_pages)); + + *identity += set_phys_range_identity(start_pfn, end_pfn); +} + static unsigned long __init xen_set_identity_and_release( const struct e820entry *list, size_t map_size, unsigned long nr_pages) { @@ -226,14 +256,11 @@ static unsigned long __init xen_set_identity_and_release( if (entry->type == E820_RAM) end_pfn = PFN_UP(entry->addr); - if (start_pfn < end_pfn) { - if (start_pfn < nr_pages) - released += xen_do_chunk( - start_pfn, min(end_pfn, nr_pages), true); + if (start_pfn < end_pfn) + xen_set_identity_and_release_chunk( + start_pfn, end_pfn, nr_pages, + &released, &identity); - identity += set_phys_range_identity( - start_pfn, end_pfn); - } start = end; } } diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index b095739..506fa08 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -28,7 +28,6 @@ void xen_setup_shared_info(void); void xen_build_mfn_list_list(void); void xen_setup_machphys_mapping(void); pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn); -void xen_ident_map_ISA(void); void xen_reserve_top(void); extern unsigned long xen_max_p2m_pfn; -- 1.7.2.5
Konrad Rzeszutek Wilk
2012-May-03 15:37 UTC
Re: [Xen-devel] [PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar.
On Thu, May 03, 2012 at 12:58:05PM +0100, David Vrabel wrote:> On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote: > > They use the same set of arguments, so it is just the matter > > of using the proper hypercall. > > > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > Acked-by: David Vrabel <david.vrabel@citrix.com> > > But would it be better to fold this into the previous patch?Nah, one logical change per patch.> > David > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
David Vrabel
2012-May-03 16:27 UTC
Re: [Xen-devel] [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
On 03/05/12 16:15, David Vrabel wrote:> > xen: update VA mapping when releasing memory during setup > > In xen_memory_setup(), if a page that is being released has a VA > mapping this must also be updated. Otherwise, the page will be not > released completely -- it will still be referenced in Xen and won''t be > freed util the mapping is removed and this prevents it from being > reallocated at a different PFN. > > This was already being done for the ISA memory region in > xen_ident_map_ISA() but on many systems this was omitting a few pages > as many systems marked a few pages below the ISA memory region as > reserved in the e820 map. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > ---[...]> --- a/arch/x86/xen/mmu.c > +++ b/arch/x86/xen/mmu.c > @@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) > #endif > } > > -void __init xen_ident_map_ISA(void) > -{ > - unsigned long pa; > - > - /* > - * If we''re dom0, then linear map the ISA machine addresses into > - * the kernel''s address space. > - */ > - if (!xen_initial_domain()) > - return;It might look like this test has gone, however the new code which updates the VA mapping uses the e820 map and for a domU its map will not have a ISA region so there''s no mapping to be updated. David> - > - xen_raw_printk("Xen: setup ISA identity maps\n"); > - > - for (pa = ISA_START_ADDRESS; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) { > - pte_t pte = mfn_pte(PFN_DOWN(pa), PAGE_KERNEL_IO); > - > - if (HYPERVISOR_update_va_mapping(PAGE_OFFSET + pa, pte, 0)) > - BUG(); > - } > - > - xen_flush_tlb(); > -} > - > static void __init xen_post_allocator_init(void) > { > pv_mmu_ops.set_pte = xen_set_pte; > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index 506a3e6..d5f8714 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -139,6 +139,13 @@ static unsigned long __init xen_do_chunk(unsigned long start, > > return len; > } > + > +static unsigned long __init xen_release_chunk(unsigned long start, > + unsigned long end) > +{ > + return xen_do_chunk(start, end, true); > +} > + > static unsigned long __init xen_populate_chunk( > const struct e820entry *list, size_t map_size, > unsigned long max_pfn, unsigned long *last_pfn, > @@ -197,6 +204,29 @@ static unsigned long __init xen_populate_chunk( > } > return done; > } > + > +static void __init xen_set_identity_and_release_chunk( > + unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages, > + unsigned long *released, unsigned long *identity) > +{ > + unsigned long pfn; > + > + /* > + * If the PFNs are currently mapped, the VA mapping also needs > + * to be updated to be 1:1. > + */ > + for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) > + (void)HYPERVISOR_update_va_mapping( > + (unsigned long)__va(pfn << PAGE_SHIFT), > + mfn_pte(pfn, PAGE_KERNEL_IO), 0); > + > + if (start_pfn < nr_pages) > + *released += xen_release_chunk( > + start_pfn, min(end_pfn, nr_pages)); > + > + *identity += set_phys_range_identity(start_pfn, end_pfn); > +} > + > static unsigned long __init xen_set_identity_and_release( > const struct e820entry *list, size_t map_size, unsigned long nr_pages) > { > @@ -226,14 +256,11 @@ static unsigned long __init xen_set_identity_and_release( > if (entry->type == E820_RAM) > end_pfn = PFN_UP(entry->addr); > > - if (start_pfn < end_pfn) { > - if (start_pfn < nr_pages) > - released += xen_do_chunk( > - start_pfn, min(end_pfn, nr_pages), true); > + if (start_pfn < end_pfn) > + xen_set_identity_and_release_chunk( > + start_pfn, end_pfn, nr_pages, > + &released, &identity); > > - identity += set_phys_range_identity( > - start_pfn, end_pfn); > - } > start = end; > } > } > diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h > index b095739..506fa08 100644 > --- a/arch/x86/xen/xen-ops.h > +++ b/arch/x86/xen/xen-ops.h > @@ -28,7 +28,6 @@ void xen_setup_shared_info(void); > void xen_build_mfn_list_list(void); > void xen_setup_machphys_mapping(void); > pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn); > -void xen_ident_map_ISA(void); > void xen_reserve_top(void); > extern unsigned long xen_max_p2m_pfn; >
Konrad Rzeszutek Wilk
2012-May-07 18:48 UTC
Re: [Xen-devel] [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
On Thu, May 03, 2012 at 05:27:27PM +0100, David Vrabel wrote:> On 03/05/12 16:15, David Vrabel wrote: > > > > xen: update VA mapping when releasing memory during setup > > > > In xen_memory_setup(), if a page that is being released has a VA > > mapping this must also be updated. Otherwise, the page will be not > > released completely -- it will still be referenced in Xen and won''t be > > freed util the mapping is removed and this prevents it from being > > reallocated at a different PFN. > > > > This was already being done for the ISA memory region in > > xen_ident_map_ISA() but on many systems this was omitting a few pages > > as many systems marked a few pages below the ISA memory region as > > reserved in the e820 map. > > > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > > --- > [...] > > --- a/arch/x86/xen/mmu.c > > +++ b/arch/x86/xen/mmu.c > > @@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) > > #endif > > } > > > > -void __init xen_ident_map_ISA(void) > > -{ > > - unsigned long pa; > > - > > - /* > > - * If we''re dom0, then linear map the ISA machine addresses into > > - * the kernel''s address space. > > - */ > > - if (!xen_initial_domain()) > > - return; > > It might look like this test has gone, however the new code which > updates the VA mapping uses the e820 map and for a domU its map will not > have a ISA region so there''s no mapping to be updated.What if you use e820_hole=1 and the pci=xx in the guest?> > David > > > - > > - xen_raw_printk("Xen: setup ISA identity maps\n"); > > - > > - for (pa = ISA_START_ADDRESS; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) { > > - pte_t pte = mfn_pte(PFN_DOWN(pa), PAGE_KERNEL_IO); > > - > > - if (HYPERVISOR_update_va_mapping(PAGE_OFFSET + pa, pte, 0)) > > - BUG(); > > - } > > - > > - xen_flush_tlb(); > > -} > > - > > static void __init xen_post_allocator_init(void) > > { > > pv_mmu_ops.set_pte = xen_set_pte; > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > > index 506a3e6..d5f8714 100644 > > --- a/arch/x86/xen/setup.c > > +++ b/arch/x86/xen/setup.c > > @@ -139,6 +139,13 @@ static unsigned long __init xen_do_chunk(unsigned long start, > > > > return len; > > } > > + > > +static unsigned long __init xen_release_chunk(unsigned long start, > > + unsigned long end) > > +{ > > + return xen_do_chunk(start, end, true); > > +} > > + > > static unsigned long __init xen_populate_chunk( > > const struct e820entry *list, size_t map_size, > > unsigned long max_pfn, unsigned long *last_pfn, > > @@ -197,6 +204,29 @@ static unsigned long __init xen_populate_chunk( > > } > > return done; > > } > > + > > +static void __init xen_set_identity_and_release_chunk( > > + unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages, > > + unsigned long *released, unsigned long *identity) > > +{ > > + unsigned long pfn; > > + > > + /* > > + * If the PFNs are currently mapped, the VA mapping also needs > > + * to be updated to be 1:1. > > + */ > > + for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) > > + (void)HYPERVISOR_update_va_mapping( > > + (unsigned long)__va(pfn << PAGE_SHIFT), > > + mfn_pte(pfn, PAGE_KERNEL_IO), 0); > > + > > + if (start_pfn < nr_pages) > > + *released += xen_release_chunk( > > + start_pfn, min(end_pfn, nr_pages)); > > + > > + *identity += set_phys_range_identity(start_pfn, end_pfn); > > +} > > + > > static unsigned long __init xen_set_identity_and_release( > > const struct e820entry *list, size_t map_size, unsigned long nr_pages) > > { > > @@ -226,14 +256,11 @@ static unsigned long __init xen_set_identity_and_release( > > if (entry->type == E820_RAM) > > end_pfn = PFN_UP(entry->addr); > > > > - if (start_pfn < end_pfn) { > > - if (start_pfn < nr_pages) > > - released += xen_do_chunk( > > - start_pfn, min(end_pfn, nr_pages), true); > > + if (start_pfn < end_pfn) > > + xen_set_identity_and_release_chunk( > > + start_pfn, end_pfn, nr_pages, > > + &released, &identity); > > > > - identity += set_phys_range_identity( > > - start_pfn, end_pfn); > > - } > > start = end; > > } > > } > > diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h > > index b095739..506fa08 100644 > > --- a/arch/x86/xen/xen-ops.h > > +++ b/arch/x86/xen/xen-ops.h > > @@ -28,7 +28,6 @@ void xen_setup_shared_info(void); > > void xen_build_mfn_list_list(void); > > void xen_setup_machphys_mapping(void); > > pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn); > > -void xen_ident_map_ISA(void); > > void xen_reserve_top(void); > > extern unsigned long xen_max_p2m_pfn; > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
David Vrabel
2012-May-08 18:12 UTC
Re: [Xen-devel] [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
On 07/05/12 19:48, Konrad Rzeszutek Wilk wrote:> On Thu, May 03, 2012 at 05:27:27PM +0100, David Vrabel wrote: >> On 03/05/12 16:15, David Vrabel wrote: >>> >>> xen: update VA mapping when releasing memory during setup >>> >>> In xen_memory_setup(), if a page that is being released has a VA >>> mapping this must also be updated. Otherwise, the page will be not >>> released completely -- it will still be referenced in Xen and won''t be >>> freed util the mapping is removed and this prevents it from being >>> reallocated at a different PFN. >>> >>> This was already being done for the ISA memory region in >>> xen_ident_map_ISA() but on many systems this was omitting a few pages >>> as many systems marked a few pages below the ISA memory region as >>> reserved in the e820 map. >>> >>> Signed-off-by: David Vrabel <david.vrabel@citrix.com> >>> --- >> [...] >>> --- a/arch/x86/xen/mmu.c >>> +++ b/arch/x86/xen/mmu.c >>> @@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) >>> #endif >>> } >>> >>> -void __init xen_ident_map_ISA(void) >>> -{ >>> - unsigned long pa; >>> - >>> - /* >>> - * If we''re dom0, then linear map the ISA machine addresses into >>> - * the kernel''s address space. >>> - */ >>> - if (!xen_initial_domain()) >>> - return; >> >> It might look like this test has gone, however the new code which >> updates the VA mapping uses the e820 map and for a domU its map will not >> have a ISA region so there''s no mapping to be updated. > > What if you use e820_hole=1 and the pci=xx in the guest?Are these xl configuration options? I''m not familiar with xl. The PCI memory hole should be above 3 GiB so this hole will be will above the memory that will be initially mapped at boot. I''ve not managed to persuade my test box to passthrough a PCI device to a guest (using xapi as the toolstack) to confirm, though. I''ll have another go tomorrow. David
Konrad Rzeszutek Wilk
2012-May-08 18:24 UTC
Re: [Xen-devel] [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).
On Tue, May 08, 2012 at 07:12:18PM +0100, David Vrabel wrote:> On 07/05/12 19:48, Konrad Rzeszutek Wilk wrote: > > On Thu, May 03, 2012 at 05:27:27PM +0100, David Vrabel wrote: > >> On 03/05/12 16:15, David Vrabel wrote: > >>> > >>> xen: update VA mapping when releasing memory during setup > >>> > >>> In xen_memory_setup(), if a page that is being released has a VA > >>> mapping this must also be updated. Otherwise, the page will be not > >>> released completely -- it will still be referenced in Xen and won''t be > >>> freed util the mapping is removed and this prevents it from being > >>> reallocated at a different PFN. > >>> > >>> This was already being done for the ISA memory region in > >>> xen_ident_map_ISA() but on many systems this was omitting a few pages > >>> as many systems marked a few pages below the ISA memory region as > >>> reserved in the e820 map. > >>> > >>> Signed-off-by: David Vrabel <david.vrabel@citrix.com> > >>> --- > >> [...] > >>> --- a/arch/x86/xen/mmu.c > >>> +++ b/arch/x86/xen/mmu.c > >>> @@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) > >>> #endif > >>> } > >>> > >>> -void __init xen_ident_map_ISA(void) > >>> -{ > >>> - unsigned long pa; > >>> - > >>> - /* > >>> - * If we''re dom0, then linear map the ISA machine addresses into > >>> - * the kernel''s address space. > >>> - */ > >>> - if (!xen_initial_domain()) > >>> - return; > >> > >> It might look like this test has gone, however the new code which > >> updates the VA mapping uses the e820 map and for a domU its map will not > >> have a ISA region so there''s no mapping to be updated. > > > > What if you use e820_hole=1 and the pci=xx in the guest? > > Are these xl configuration options? I''m not familiar with xl.Yeah. Just have this in your guest config: e820_hole=1 pci=["01:00.0"] (And do the PCI bind/unbind to the xen-pciback module) but looking at the source there is this comment: /* Weed out anything under 1MB */ so I think we are fine.> > The PCI memory hole should be above 3 GiB so this hole will be will > above the memory that will be initially mapped at boot. > > I''ve not managed to persuade my test box to passthrough a PCI device to > a guest (using xapi as the toolstack) to confirm, though. I''ll have > another go tomorrow. > > David > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel