Daniel Kiper
2010-Aug-12 01:22 UTC
[Xen-devel] [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
Hi, Here is the third version of memory hotplug support for Xen guests patch. This one cleanly applies to git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git repository, xen/memory-hotplug head. On Fri, Aug 06, 2010 at 04:03:18PM +0400, Vasiliy G Tolstov wrote: [...]> Testing on sles 11 sp1 and opensuse 11.3. On results - send e-mail..Thx. On Fri, Aug 06, 2010 at 12:34:08PM -0400, Konrad Rzeszutek Wilk wrote: [...]> > +static int allocate_additional_memory(unsigned long nr_pages) > > +{ > > + long rc; > > + resource_size_t r_min, r_size; > > + struct resource *r; > > + struct xen_memory_reservation reservation = { > > + .address_bits = 0, > > + .extent_order = 0, > > + .domid = DOMID_SELF > > + }; > > + unsigned long flags, i, pfn; > > + > > + if (nr_pages > ARRAY_SIZE(frame_list)) > > + nr_pages = ARRAY_SIZE(frame_list); > > + > > + spin_lock_irqsave(&balloon_lock, flags); > > + > > + if (!is_memory_resource_reserved()) { > > + > > + /* > > + * Look for first unused memory region starting at page > > + * boundary. Skip last memory section created at boot time > > + * becuase it may contains unused memory pages with PG_reserved > > + * bit not set (online_pages require PG_reserved bit set). > > + */ > > + > > + r = kzalloc(sizeof(struct resource), GFP_KERNEL); > > You are holding a spinlock here. Kzalloc can sleepThx. Fixed. On Fri, Aug 06, 2010 at 10:42:48AM -0700, Jeremy Fitzhardinge wrote:> > - PV on HVM mode is supported now; it was tested on > > git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git > > repository, 2.6.34-pvhvm head, > > Good. I noticed you have some specific tests for "xen_pv_domain()" - > are there many differences between pv and hvm?No. Only those changes are needed where xen_domain()/xen_pv_domain() is used.> >>>+#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG > >>>+static inline unsigned long current_target(void) > >>>+{ > >>>+ return balloon_stats.target_pages; > >>Why does this need its own version? > >Because original version return values not bigger > >then initial memory allocation which does not allow > >memory hotplug to function. > > But surely they can be combined? A system without > XEN_BALLOON_MEMORY_HOTPLUG is identical to a system with > XEN_BALLOON_MEMORY_HOTPLUG which hasn''t yet added any memory. Some > variables may become constants (because memory can never be hot-added), > but the logic of the code should be the same.Done.> Overall, this looks much better. The next step is to split this into at > least two patches: one for the core code, and one for the Xen bits. > Each patch should do just one logical operation, so if you have several > distinct changes to the core code, put them in separate patches.I will do that if this patch will be accepted.> >diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > >index 38434da..beb1aa7 100644 > >--- a/arch/x86/Kconfig > >+++ b/arch/x86/Kconfig > >@@ -1273,7 +1273,7 @@ config ARCH_SELECT_MEMORY_MODEL > > depends on ARCH_SPARSEMEM_ENABLE > > > > config ARCH_MEMORY_PROBE > >- def_bool y > >+ def_bool X86_64&& !XEN > > depends on MEMORY_HOTPLUG > > The trouble with making anything statically depend on Xen at config time > is that you lose it even if you''re not running under Xen. A pvops > kernel can run on bare hardware as well, and we don''t want to lose > functionality (assume that CONFIG_XEN is always set, since distros do > always set it). > > Can you find a clean way to prevent/disable ARCH_MEMORY_PROBE at runtime > when in a Xen context?There is no simple way to do that. It requiers to do some changes in drivers/base/memory.c code. I think it should be done as kernel boot option (on by default to not break things using this interface now). If it be useful for maintainers of mm/memory_hotplug.c and drivers/base/memory.c code then I could do that. Currently original arch/x86/Kconfig version is restored.> >+/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG > >*/ > >+static int __ref xen_add_memory(int nid, u64 start, u64 size) > > Could this be __meminit too then?Good question. I looked throught the code and could not find any simple explanation why mm/memory_hotplug.c authors used __ref instead __meminit. Could you (mm/memory_hotplug.c authors/maintainers) tell us why ???> >+{ > >+ pg_data_t *pgdat = NULL; > >+ int new_pgdat = 0, ret; > >+ > >+ lock_system_sleep(); > > What''s this for? I see all its other users are in the memory hotplug > code, but presumably they''re concerned about a real S3 suspend. Do we > care about that here?Yes, because as I know S3 state is supported by Xen guests.> Actually, this is nearly identical to mm/memory_hotplug.c:add_memory(). > It looks to me like you should: > > * pull the common core out into mm/memory_hotplug.c:__add_memory() > (or a better name) > * make add_memory() do its > register_memory_resource()/firmware_map_add_hotplug() around that > (assuming they''re definitely unwanted in the Xen case) > * make xen_add_memory() just call __add_memory() along with whatever > else it needs (which is nothing?) > > That way you can export a high-level __add_memory function from > memory_hotplug.c rather than the two internal detail functions.Done.> >+ r->name = "System RAM"; > > How about making it clear its Xen hotplug RAM? Or do things care about > the "System RAM" name?As I know no however as I saw anybody do not differentiate between normal and hotplugged memory. I thought about that ealier however stated that this soultion does not give us any real gain. That is why I decided to use standard name for hotplugged memory. If you have a questions please drop me a line. Daniel Signed-off-by: Daniel Kiper <dkiper@net-space.pl> --- arch/x86/Kconfig | 2 +- drivers/xen/balloon.c | 95 ++++++++------------------------------- include/linux/memory_hotplug.h | 3 +- mm/memory_hotplug.c | 55 ++++++++++++++++------- 4 files changed, 61 insertions(+), 94 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index beb1aa7..9458685 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1273,7 +1273,7 @@ config ARCH_SELECT_MEMORY_MODEL depends on ARCH_SPARSEMEM_ENABLE config ARCH_MEMORY_PROBE - def_bool X86_64 && !XEN + def_bool X86_64 depends on MEMORY_HOTPLUG config ILLEGAL_POINTER_VALUE diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 31edc26..5120075 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -193,63 +193,11 @@ static void balloon_alarm(unsigned long unused) } #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG -static inline unsigned long current_target(void) -{ - return balloon_stats.target_pages; -} - static inline u64 is_memory_resource_reserved(void) { return balloon_stats.hotplug_start_paddr; } -/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -static int __ref xen_add_memory(int nid, u64 start, u64 size) -{ - pg_data_t *pgdat = NULL; - int new_pgdat = 0, ret; - - lock_system_sleep(); - - if (!node_online(nid)) { - pgdat = hotadd_new_pgdat(nid, start); - ret = -ENOMEM; - if (!pgdat) - goto out; - new_pgdat = 1; - } - - /* call arch''s memory hotadd */ - ret = arch_add_memory(nid, start, size); - - if (ret < 0) - goto error; - - /* we online node here. we can''t roll back from here. */ - node_set_online(nid); - - if (new_pgdat) { - ret = register_one_node(nid); - /* - * If sysfs file of new node can''t create, cpu on the node - * can''t be hot-added. There is no rollback way now. - * So, check by BUG_ON() to catch it reluctantly.. - */ - BUG_ON(ret); - } - - goto out; - -error: - /* rollback pgdat allocation */ - if (new_pgdat) - rollback_node_hotadd(nid, pgdat); - -out: - unlock_system_sleep(); - return ret; -} - static int allocate_additional_memory(unsigned long nr_pages) { long rc; @@ -265,8 +213,6 @@ static int allocate_additional_memory(unsigned long nr_pages) if (nr_pages > ARRAY_SIZE(frame_list)) nr_pages = ARRAY_SIZE(frame_list); - spin_lock_irqsave(&balloon_lock, flags); - if (!is_memory_resource_reserved()) { /* @@ -280,7 +226,7 @@ static int allocate_additional_memory(unsigned long nr_pages) if (!r) { rc = -ENOMEM; - goto out; + goto out_0; } r->name = "System RAM"; @@ -293,12 +239,14 @@ static int allocate_additional_memory(unsigned long nr_pages) if (rc < 0) { kfree(r); - goto out; + goto out_0; } balloon_stats.hotplug_start_paddr = r->start; } + spin_lock_irqsave(&balloon_lock, flags); + pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr + balloon_stats.hotplug_size); for (i = 0; i < nr_pages; ++i, ++pfn) @@ -310,7 +258,7 @@ static int allocate_additional_memory(unsigned long nr_pages) rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); if (rc < 0) - goto out; + goto out_1; pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr + balloon_stats.hotplug_size); @@ -323,9 +271,10 @@ static int allocate_additional_memory(unsigned long nr_pages) balloon_stats.hotplug_size += rc << PAGE_SHIFT; balloon_stats.current_pages += rc; -out: +out_1: spin_unlock_irqrestore(&balloon_lock, flags); +out_0: return rc < 0 ? rc : rc != nr_pages; } @@ -337,11 +286,11 @@ static void hotplug_allocated_memory(void) nid = memory_add_physaddr_to_nid(balloon_stats.hotplug_start_paddr); - ret = xen_add_memory(nid, balloon_stats.hotplug_start_paddr, + ret = add_registered_memory(nid, balloon_stats.hotplug_start_paddr, balloon_stats.hotplug_size); if (ret) { - pr_err("%s: xen_add_memory: Memory hotplug failed: %i\n", + pr_err("%s: add_registered_memory: Memory hotplug failed: %i\n", __func__, ret); goto error; } @@ -388,18 +337,6 @@ out: balloon_stats.hotplug_size = 0; } #else -static unsigned long current_target(void) -{ - unsigned long target = balloon_stats.target_pages; - - target = min(target, - balloon_stats.current_pages + - balloon_stats.balloon_low + - balloon_stats.balloon_high); - - return target; -} - static inline u64 is_memory_resource_reserved(void) { return 0; @@ -407,13 +344,21 @@ static inline u64 is_memory_resource_reserved(void) static inline int allocate_additional_memory(unsigned long nr_pages) { + /* + * CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is not set. + * balloon_stats.target_pages could not be bigger + * than balloon_stats.current_pages because additional + * memory allocation is not possible. + */ + balloon_stats.target_pages = balloon_stats.current_pages; + return 0; } static inline void hotplug_allocated_memory(void) { } -#endif +#endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */ static int increase_reservation(unsigned long nr_pages) { @@ -553,7 +498,7 @@ static void balloon_process(struct work_struct *work) mutex_lock(&balloon_mutex); do { - credit = current_target() - balloon_stats.current_pages; + credit = balloon_stats.target_pages - balloon_stats.current_pages; if (credit > 0) { if (balloon_stats.balloon_low || balloon_stats.balloon_high) @@ -572,7 +517,7 @@ static void balloon_process(struct work_struct *work) } while ((credit != 0) && !need_sleep); /* Schedule more work if there is some still to be done. */ - if (current_target() != balloon_stats.current_pages) + if (balloon_stats.target_pages != balloon_stats.current_pages) mod_timer(&balloon_timer, jiffies + HZ); else if (is_memory_resource_reserved()) hotplug_allocated_memory(); diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 6652eae..37f1894 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -202,8 +202,7 @@ static inline int is_mem_section_removable(unsigned long pfn, } #endif /* CONFIG_MEMORY_HOTREMOVE */ -extern pg_data_t *hotadd_new_pgdat(int nid, u64 start); -extern void rollback_node_hotadd(int nid, pg_data_t *pgdat); +extern int add_registered_memory(int nid, u64 start, u64 size); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); extern int remove_memory(u64 start, u64 size); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 143e03c..48a65bb 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -453,7 +453,7 @@ int online_pages(unsigned long pfn, unsigned long nr_pages) #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */ /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) +static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) { struct pglist_data *pgdat; unsigned long zones_size[MAX_NR_ZONES] = {0}; @@ -473,32 +473,21 @@ pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) return pgdat; } -EXPORT_SYMBOL_GPL(hotadd_new_pgdat); -void rollback_node_hotadd(int nid, pg_data_t *pgdat) +static void rollback_node_hotadd(int nid, pg_data_t *pgdat) { arch_refresh_nodedata(nid, NULL); arch_free_nodedata(pgdat); return; } -EXPORT_SYMBOL_GPL(rollback_node_hotadd); - /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -int __ref add_memory(int nid, u64 start, u64 size) +static int __ref __add_memory(int nid, u64 start, u64 size) { pg_data_t *pgdat = NULL; int new_pgdat = 0; - struct resource *res; int ret; - lock_system_sleep(); - - res = register_memory_resource(start, size); - ret = -EEXIST; - if (!res) - goto out; - if (!node_online(nid)) { pgdat = hotadd_new_pgdat(nid, start); ret = -ENOMEM; @@ -535,11 +524,45 @@ error: /* rollback pgdat allocation and others */ if (new_pgdat) rollback_node_hotadd(nid, pgdat); - if (res) - release_memory_resource(res); out: + return ret; +} + +int __ref add_registered_memory(int nid, u64 start, u64 size) +{ + int ret; + + lock_system_sleep(); + ret = __add_memory(nid, start, size); unlock_system_sleep(); + + return ret; +} +EXPORT_SYMBOL_GPL(add_registered_memory); + +int __ref add_memory(int nid, u64 start, u64 size) +{ + int ret = -EEXIST; + struct resource *res; + + lock_system_sleep(); + + res = register_memory_resource(start, size); + + if (!res) + goto out; + + ret = __add_memory(nid, start, size); + + if (!ret) + goto out; + + release_memory_resource(res); + +out: + unlock_system_sleep(); + return ret; } EXPORT_SYMBOL_GPL(add_memory); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vasiliy G Tolstov
2010-Aug-12 07:44 UTC
Re: [Xen-devel] [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
В Чтв, 12/08/2010 в 03:22 +0200, Daniel Kiper пишет:> Hi, > > Here is the third version of memory hotplug support > for Xen guests patch. This one cleanly applies to > git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git > repository, xen/memory-hotplug head. > > On Fri, Aug 06, 2010 at 04:03:18PM +0400, Vasiliy G Tolstov wrote: > [...] > > Testing on sles 11 sp1 and opensuse 11.3. On results - send e-mail.. > > Thx. > > On Fri, Aug 06, 2010 at 12:34:08PM -0400, Konrad Rzeszutek Wilk wrote: > [...] > > > +static int allocate_additional_memory(unsigned long nr_pages) > > > +{ > > > + long rc; > > > + resource_size_t r_min, r_size; > > > + struct resource *r; > > > + struct xen_memory_reservation reservation = { > > > + .address_bits = 0, > > > + .extent_order = 0, > > > + .domid = DOMID_SELF > > > + }; > > > + unsigned long flags, i, pfn; > > > + > > > + if (nr_pages > ARRAY_SIZE(frame_list)) > > > + nr_pages = ARRAY_SIZE(frame_list); > > > + > > > + spin_lock_irqsave(&balloon_lock, flags); > > > + > > > + if (!is_memory_resource_reserved()) { > > > + > > > + /* > > > + * Look for first unused memory region starting at page > > > + * boundary. Skip last memory section created at boot time > > > + * becuase it may contains unused memory pages with PG_reserved > > > + * bit not set (online_pages require PG_reserved bit set). > > > + */ > > > + > > > + r = kzalloc(sizeof(struct resource), GFP_KERNEL); > > > > You are holding a spinlock here. Kzalloc can sleep > > Thx. Fixed. > > On Fri, Aug 06, 2010 at 10:42:48AM -0700, Jeremy Fitzhardinge wrote: > > > - PV on HVM mode is supported now; it was tested on > > > git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git > > > repository, 2.6.34-pvhvm head, > > > > Good. I noticed you have some specific tests for "xen_pv_domain()" - > > are there many differences between pv and hvm? > > No. Only those changes are needed where > xen_domain()/xen_pv_domain() is used. > > > >>>+#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG > > >>>+static inline unsigned long current_target(void) > > >>>+{ > > >>>+ return balloon_stats.target_pages; > > >>Why does this need its own version? > > >Because original version return values not bigger > > >then initial memory allocation which does not allow > > >memory hotplug to function. > > > > But surely they can be combined? A system without > > XEN_BALLOON_MEMORY_HOTPLUG is identical to a system with > > XEN_BALLOON_MEMORY_HOTPLUG which hasn''t yet added any memory. Some > > variables may become constants (because memory can never be hot-added), > > but the logic of the code should be the same. > > Done. > > > Overall, this looks much better. The next step is to split this into at > > least two patches: one for the core code, and one for the Xen bits. > > Each patch should do just one logical operation, so if you have several > > distinct changes to the core code, put them in separate patches. > > I will do that if this patch will be accepted. > > > >diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > >index 38434da..beb1aa7 100644 > > >--- a/arch/x86/Kconfig > > >+++ b/arch/x86/Kconfig > > >@@ -1273,7 +1273,7 @@ config ARCH_SELECT_MEMORY_MODEL > > > depends on ARCH_SPARSEMEM_ENABLE > > > > > > config ARCH_MEMORY_PROBE > > >- def_bool y > > >+ def_bool X86_64&& !XEN > > > depends on MEMORY_HOTPLUG > > > > The trouble with making anything statically depend on Xen at config time > > is that you lose it even if you''re not running under Xen. A pvops > > kernel can run on bare hardware as well, and we don''t want to lose > > functionality (assume that CONFIG_XEN is always set, since distros do > > always set it). > > > > Can you find a clean way to prevent/disable ARCH_MEMORY_PROBE at runtime > > when in a Xen context? > > There is no simple way to do that. It requiers to do some > changes in drivers/base/memory.c code. I think it should > be done as kernel boot option (on by default to not break > things using this interface now). If it be useful for maintainers > of mm/memory_hotplug.c and drivers/base/memory.c code then > I could do that. Currently original arch/x86/Kconfig version > is restored. > > > >+/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG > > >*/ > > >+static int __ref xen_add_memory(int nid, u64 start, u64 size) > > > > Could this be __meminit too then? > > Good question. I looked throught the code and could > not find any simple explanation why mm/memory_hotplug.c > authors used __ref instead __meminit. Could you (mm/memory_hotplug.c > authors/maintainers) tell us why ??? > > > >+{ > > >+ pg_data_t *pgdat = NULL; > > >+ int new_pgdat = 0, ret; > > >+ > > >+ lock_system_sleep(); > > > > What''s this for? I see all its other users are in the memory hotplug > > code, but presumably they''re concerned about a real S3 suspend. Do we > > care about that here? > > Yes, because as I know S3 state is supported by Xen guests. > > > Actually, this is nearly identical to mm/memory_hotplug.c:add_memory(). > > It looks to me like you should: > > > > * pull the common core out into mm/memory_hotplug.c:__add_memory() > > (or a better name) > > * make add_memory() do its > > register_memory_resource()/firmware_map_add_hotplug() around that > > (assuming they''re definitely unwanted in the Xen case) > > * make xen_add_memory() just call __add_memory() along with whatever > > else it needs (which is nothing?) > > > > That way you can export a high-level __add_memory function from > > memory_hotplug.c rather than the two internal detail functions. > > Done. > > > >+ r->name = "System RAM"; > > > > How about making it clear its Xen hotplug RAM? Or do things care about > > the "System RAM" name? > > As I know no however as I saw anybody do not differentiate between > normal and hotplugged memory. I thought about that ealier however > stated that this soultion does not give us any real gain. That is why > I decided to use standard name for hotplugged memory. > > If you have a questions please drop me a line. > > Daniel > > Signed-off-by: Daniel Kiper <dkiper@net-space.pl> > --- > arch/x86/Kconfig | 2 +- > drivers/xen/balloon.c | 95 ++++++++------------------------------- > include/linux/memory_hotplug.h | 3 +- > mm/memory_hotplug.c | 55 ++++++++++++++++------- > 4 files changed, 61 insertions(+), 94 deletions(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index beb1aa7..9458685 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1273,7 +1273,7 @@ config ARCH_SELECT_MEMORY_MODEL > depends on ARCH_SPARSEMEM_ENABLE > > config ARCH_MEMORY_PROBE > - def_bool X86_64 && !XEN > + def_bool X86_64 > depends on MEMORY_HOTPLUG > > config ILLEGAL_POINTER_VALUE > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index 31edc26..5120075 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -193,63 +193,11 @@ static void balloon_alarm(unsigned long unused) > } > > #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG > -static inline unsigned long current_target(void) > -{ > - return balloon_stats.target_pages; > -} > - > static inline u64 is_memory_resource_reserved(void) > { > return balloon_stats.hotplug_start_paddr; > } > > -/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ > -static int __ref xen_add_memory(int nid, u64 start, u64 size) > -{ > - pg_data_t *pgdat = NULL; > - int new_pgdat = 0, ret; > - > - lock_system_sleep(); > - > - if (!node_online(nid)) { > - pgdat = hotadd_new_pgdat(nid, start); > - ret = -ENOMEM; > - if (!pgdat) > - goto out; > - new_pgdat = 1; > - } > - > - /* call arch''s memory hotadd */ > - ret = arch_add_memory(nid, start, size); > - > - if (ret < 0) > - goto error; > - > - /* we online node here. we can''t roll back from here. */ > - node_set_online(nid); > - > - if (new_pgdat) { > - ret = register_one_node(nid); > - /* > - * If sysfs file of new node can''t create, cpu on the node > - * can''t be hot-added. There is no rollback way now. > - * So, check by BUG_ON() to catch it reluctantly.. > - */ > - BUG_ON(ret); > - } > - > - goto out; > - > -error: > - /* rollback pgdat allocation */ > - if (new_pgdat) > - rollback_node_hotadd(nid, pgdat); > - > -out: > - unlock_system_sleep(); > - return ret; > -} > - > static int allocate_additional_memory(unsigned long nr_pages) > { > long rc; > @@ -265,8 +213,6 @@ static int allocate_additional_memory(unsigned long nr_pages) > if (nr_pages > ARRAY_SIZE(frame_list)) > nr_pages = ARRAY_SIZE(frame_list); > > - spin_lock_irqsave(&balloon_lock, flags); > - > if (!is_memory_resource_reserved()) { > > /* > @@ -280,7 +226,7 @@ static int allocate_additional_memory(unsigned long nr_pages) > > if (!r) { > rc = -ENOMEM; > - goto out; > + goto out_0; > } > > r->name = "System RAM"; > @@ -293,12 +239,14 @@ static int allocate_additional_memory(unsigned long nr_pages) > > if (rc < 0) { > kfree(r); > - goto out; > + goto out_0; > } > > balloon_stats.hotplug_start_paddr = r->start; > } > > + spin_lock_irqsave(&balloon_lock, flags); > + > pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr + balloon_stats.hotplug_size); > > for (i = 0; i < nr_pages; ++i, ++pfn) > @@ -310,7 +258,7 @@ static int allocate_additional_memory(unsigned long nr_pages) > rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); > > if (rc < 0) > - goto out; > + goto out_1; > > pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr + balloon_stats.hotplug_size); > > @@ -323,9 +271,10 @@ static int allocate_additional_memory(unsigned long nr_pages) > balloon_stats.hotplug_size += rc << PAGE_SHIFT; > balloon_stats.current_pages += rc; > > -out: > +out_1: > spin_unlock_irqrestore(&balloon_lock, flags); > > +out_0: > return rc < 0 ? rc : rc != nr_pages; > } > > @@ -337,11 +286,11 @@ static void hotplug_allocated_memory(void) > > nid = memory_add_physaddr_to_nid(balloon_stats.hotplug_start_paddr); > > - ret = xen_add_memory(nid, balloon_stats.hotplug_start_paddr, > + ret = add_registered_memory(nid, balloon_stats.hotplug_start_paddr, > balloon_stats.hotplug_size); > > if (ret) { > - pr_err("%s: xen_add_memory: Memory hotplug failed: %i\n", > + pr_err("%s: add_registered_memory: Memory hotplug failed: %i\n", > __func__, ret); > goto error; > } > @@ -388,18 +337,6 @@ out: > balloon_stats.hotplug_size = 0; > } > #else > -static unsigned long current_target(void) > -{ > - unsigned long target = balloon_stats.target_pages; > - > - target = min(target, > - balloon_stats.current_pages + > - balloon_stats.balloon_low + > - balloon_stats.balloon_high); > - > - return target; > -} > - > static inline u64 is_memory_resource_reserved(void) > { > return 0; > @@ -407,13 +344,21 @@ static inline u64 is_memory_resource_reserved(void) > > static inline int allocate_additional_memory(unsigned long nr_pages) > { > + /* > + * CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is not set. > + * balloon_stats.target_pages could not be bigger > + * than balloon_stats.current_pages because additional > + * memory allocation is not possible. > + */ > + balloon_stats.target_pages = balloon_stats.current_pages; > + > return 0; > } > > static inline void hotplug_allocated_memory(void) > { > } > -#endif > +#endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */ > > static int increase_reservation(unsigned long nr_pages) > { > @@ -553,7 +498,7 @@ static void balloon_process(struct work_struct *work) > mutex_lock(&balloon_mutex); > > do { > - credit = current_target() - balloon_stats.current_pages; > + credit = balloon_stats.target_pages - balloon_stats.current_pages; > > if (credit > 0) { > if (balloon_stats.balloon_low || balloon_stats.balloon_high) > @@ -572,7 +517,7 @@ static void balloon_process(struct work_struct *work) > } while ((credit != 0) && !need_sleep); > > /* Schedule more work if there is some still to be done. */ > - if (current_target() != balloon_stats.current_pages) > + if (balloon_stats.target_pages != balloon_stats.current_pages) > mod_timer(&balloon_timer, jiffies + HZ); > else if (is_memory_resource_reserved()) > hotplug_allocated_memory(); > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h > index 6652eae..37f1894 100644 > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -202,8 +202,7 @@ static inline int is_mem_section_removable(unsigned long pfn, > } > #endif /* CONFIG_MEMORY_HOTREMOVE */ > > -extern pg_data_t *hotadd_new_pgdat(int nid, u64 start); > -extern void rollback_node_hotadd(int nid, pg_data_t *pgdat); > +extern int add_registered_memory(int nid, u64 start, u64 size); > extern int add_memory(int nid, u64 start, u64 size); > extern int arch_add_memory(int nid, u64 start, u64 size); > extern int remove_memory(u64 start, u64 size); > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 143e03c..48a65bb 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -453,7 +453,7 @@ int online_pages(unsigned long pfn, unsigned long nr_pages) > #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */ > > /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ > -pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) > +static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) > { > struct pglist_data *pgdat; > unsigned long zones_size[MAX_NR_ZONES] = {0}; > @@ -473,32 +473,21 @@ pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) > > return pgdat; > } > -EXPORT_SYMBOL_GPL(hotadd_new_pgdat); > > -void rollback_node_hotadd(int nid, pg_data_t *pgdat) > +static void rollback_node_hotadd(int nid, pg_data_t *pgdat) > { > arch_refresh_nodedata(nid, NULL); > arch_free_nodedata(pgdat); > return; > } > -EXPORT_SYMBOL_GPL(rollback_node_hotadd); > - > > /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ > -int __ref add_memory(int nid, u64 start, u64 size) > +static int __ref __add_memory(int nid, u64 start, u64 size) > { > pg_data_t *pgdat = NULL; > int new_pgdat = 0; > - struct resource *res; > int ret; > > - lock_system_sleep(); > - > - res = register_memory_resource(start, size); > - ret = -EEXIST; > - if (!res) > - goto out; > - > if (!node_online(nid)) { > pgdat = hotadd_new_pgdat(nid, start); > ret = -ENOMEM; > @@ -535,11 +524,45 @@ error: > /* rollback pgdat allocation and others */ > if (new_pgdat) > rollback_node_hotadd(nid, pgdat); > - if (res) > - release_memory_resource(res); > > out: > + return ret; > +} > + > +int __ref add_registered_memory(int nid, u64 start, u64 size) > +{ > + int ret; > + > + lock_system_sleep(); > + ret = __add_memory(nid, start, size); > unlock_system_sleep(); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(add_registered_memory); > + > +int __ref add_memory(int nid, u64 start, u64 size) > +{ > + int ret = -EEXIST; > + struct resource *res; > + > + lock_system_sleep(); > + > + res = register_memory_resource(start, size); > + > + if (!res) > + goto out; > + > + ret = __add_memory(nid, start, size); > + > + if (!ret) > + goto out; > + > + release_memory_resource(res); > + > +out: > + unlock_system_sleep(); > + > return ret; > } > EXPORT_SYMBOL_GPL(add_memory); >Can You provide patch to sles 11 sp1 ? I found that sles has some modification to the kernel and patch does not apply cleanly =(. -- Vasiliy G Tolstov <v.tolstov@selfip.ru> Selfip.Ru _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Kiper
2010-Aug-12 23:20 UTC
Re: [Xen-devel] [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
Hi, On Thu, Aug 12, 2010 at 11:44:01AM +0400, Vasiliy G Tolstov wrote: [...]> Can You provide patch to sles 11 sp1 ? > I found that sles has some modification to the kernel and patch does not > apply cleanly =(.Could you send me address from which I could download source of kernel used by SLES 11 SP1 ??? Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Aug-13 00:19 UTC
Re: [Xen-devel] [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
On Fri, Aug 13, 2010 at 01:20:17AM +0200, Daniel Kiper wrote:> Hi, > > On Thu, Aug 12, 2010 at 11:44:01AM +0400, Vasiliy G Tolstov wrote: > [...] > > Can You provide patch to sles 11 sp1 ? > > I found that sles has some modification to the kernel and patch does not > > apply cleanly =(. > > Could you send me address from which I could > download source of kernel used by SLES 11 SP1 ??? >See: http://wiki.xensource.com/xenwiki/XenDom0Kernels -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-13 00:43 UTC
[Xen-devel] Re: [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
On 08/11/2010 06:22 PM, Daniel Kiper wrote:>> Overall, this looks much better. The next step is to split this into at >> least two patches: one for the core code, and one for the Xen bits. >> Each patch should do just one logical operation, so if you have several >> distinct changes to the core code, put them in separate patches. > I will do that if this patch will be accepted.First step is to post it to lkml for discussion, cc:ing the relevant maintainers. (I''m not really sure who that is at the moment. It will take some digging around in the history.)>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >>> index 38434da..beb1aa7 100644 >>> --- a/arch/x86/Kconfig >>> +++ b/arch/x86/Kconfig >>> @@ -1273,7 +1273,7 @@ config ARCH_SELECT_MEMORY_MODEL >>> depends on ARCH_SPARSEMEM_ENABLE >>> >>> config ARCH_MEMORY_PROBE >>> - def_bool y >>> + def_bool X86_64&& !XEN >>> depends on MEMORY_HOTPLUG >> The trouble with making anything statically depend on Xen at config time >> is that you lose it even if you''re not running under Xen. A pvops >> kernel can run on bare hardware as well, and we don''t want to lose >> functionality (assume that CONFIG_XEN is always set, since distros do >> always set it). >> >> Can you find a clean way to prevent/disable ARCH_MEMORY_PROBE at runtime >> when in a Xen context? > There is no simple way to do that. It requiers to do some > changes in drivers/base/memory.c code. I think it should > be done as kernel boot option (on by default to not break > things using this interface now). If it be useful for maintainers > of mm/memory_hotplug.c and drivers/base/memory.c code then > I could do that. Currently original arch/x86/Kconfig version > is restored.I think adding a global flag which the Xen balloon driver can disable should be sufficient. There''s no need to make an separate user-settable control.>>> +/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG >>> */ >>> +static int __ref xen_add_memory(int nid, u64 start, u64 size) >> Could this be __meminit too then? > Good question. I looked throught the code and could > not find any simple explanation why mm/memory_hotplug.c > authors used __ref instead __meminit. Could you (mm/memory_hotplug.c > authors/maintainers) tell us why ???Quite possibly a left-over from something else. You could just try making it __meminit, then compile with, erm, the option which shows you section conflicts (it shows the number of conflicts at the end of the kernel build by default, and tells you how to explicitly list them).>>> +{ >>> + pg_data_t *pgdat = NULL; >>> + int new_pgdat = 0, ret; >>> + >>> + lock_system_sleep(); >> What''s this for? I see all its other users are in the memory hotplug >> code, but presumably they''re concerned about a real S3 suspend. Do we >> care about that here? > Yes, because as I know S3 state is supported by Xen guests.Yes, but I''m assuming the interaction between S3 and ACPI hotplug memory isn''t something that concerns a Xen guest; our hotplug mechanism is completely different.>>> + r->name = "System RAM"; >> How about making it clear its Xen hotplug RAM? Or do things care about >> the "System RAM" name? > As I know no however as I saw anybody do not differentiate between > normal and hotplugged memory. I thought about that ealier however > stated that this soultion does not give us any real gain. That is why > I decided to use standard name for hotplugged memory.Its cosmetic, but it would be useful to see what''s going on. I''ll send more detailed comments on the whole patch in a separate mail. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-13 00:46 UTC
[Xen-devel] Re: [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
On 08/11/2010 06:22 PM, Daniel Kiper wrote:> Hi, > > Here is the third version of memory hotplug support > for Xen guests patch. This one cleanly applies to > git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git > repository, xen/memory-hotplug head.Thanks. I''ll paste in the full diff and comment on that rather than this incremental update.> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index > fad3df2..4f35eaf 100644 --- a/drivers/xen/Kconfig +++ > b/drivers/xen/Kconfig @@ -9,6 +9,16 @@ config XEN_BALLOON the system > to expand the domain''s memory allocation, or alternatively return > unneeded memory to the system. +config XEN_BALLOON_MEMORY_HOTPLUG + > bool "Xen memory balloon driver with memory hotplug support" + default > n + depends on XEN_BALLOON && MEMORY_HOTPLUG + help + Xen memory > balloon driver with memory hotplug support allows expanding + memory > available for the system above limit declared at system startup. + It > is very useful on critical systems which require long run without + > rebooting. + config XEN_SCRUB_PAGES bool "Scrub pages before returning > them to system" depends on XEN_BALLOON diff --git > a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 1a0d8c2..5120075 > 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -6,6 > +6,7 @@ * Copyright (c) 2003, B Dragovic * Copyright (c) 2003-2004, M > Williamson, K Fraser * Copyright (c) 2005 Dan M. Smith, IBM > Corporation + * Copyright (c) 2010 Daniel Kiper * * This program is > free software; you can redistribute it and/or * modify it under the > terms of the GNU General Public License version 2 @@ -44,6 +45,8 @@ > #include <linux/list.h> #include <linux/sysdev.h> #include > <linux/gfp.h> +#include <linux/memory.h> +#include <linux/suspend.h> > #include <asm/page.h> #include <asm/pgalloc.h> @@ -77,6 +80,11 @@ > struct balloon_stats { /* Number of pages in high- and low-memory > balloons. */ unsigned long balloon_low; unsigned long balloon_high; > +#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG + unsigned long > boot_max_pfn; + u64 hotplug_start_paddr; + u64 hotplug_size;So does this mean you only support adding a single hotplug region? What happens if your initial increase wasn''t enough and you want to add more? Would you make this a list of hot-added memory or something? But I''m not even quite sure why you need to keep this as global data.> +#endif }; static DEFINE_MUTEX(balloon_mutex); @@ -184,17 +192,173 @@ > static void balloon_alarm(unsigned long unused) > schedule_work(&balloon_worker); } -static unsigned long > current_target(void) +#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG +static > inline u64 is_memory_resource_reserved(void) +{ + return > balloon_stats.hotplug_start_paddr; +} + +static int > allocate_additional_memory(unsigned long nr_pages) +{ + long rc; + > resource_size_t r_min, r_size; + struct resource *r; + struct > xen_memory_reservation reservation = { + .address_bits = 0, + > .extent_order = 0, + .domid = DOMID_SELF + }; + unsigned long flags, > i, pfn; + + if (nr_pages > ARRAY_SIZE(frame_list)) + nr_pages = > ARRAY_SIZE(frame_list); + + if (!is_memory_resource_reserved()) { + + > /* + * Look for first unused memory region starting at page + * > boundary. Skip last memory section created at boot time + * becuase it > may contains unused memory pages with PG_reserved + * bit not set > (online_pages require PG_reserved bit set). + */ + + r = > kzalloc(sizeof(struct resource), GFP_KERNEL); + + if (!r) { + rc = > -ENOMEM; + goto out_0; + } + + r->name = "System RAM"; + r->flags = > IORESOURCE_MEM | IORESOURCE_BUSY; + r_min = > PFN_PHYS(section_nr_to_pfn(pfn_to_section_nr(balloon_stats.boot_max_pfn) > + 1)); + r_size = (balloon_stats.target_pages - > balloon_stats.current_pages) << PAGE_SHIFT;So this just reserves enough resource to satisfy the current outstanding requirement? That''s OK if we can repeat it, but it looks like it will only do this once?> + + rc = allocate_resource(&iomem_resource, r, r_size, r_min, + > ULONG_MAX, PAGE_SIZE, NULL, NULL);Does this need to be section aligned, with a section size? Or is any old place OK?> + + if (rc < 0) { + kfree(r); + goto out_0; + } + + > balloon_stats.hotplug_start_paddr = r->start; + } + + > spin_lock_irqsave(&balloon_lock, flags); + + pfn = > PFN_DOWN(balloon_stats.hotplug_start_paddr + > balloon_stats.hotplug_size); + + for (i = 0; i < nr_pages; ++i, ++pfn) > + frame_list[i] = pfn; + + > set_xen_guest_handle(reservation.extent_start, frame_list); + > reservation.nr_extents = nr_pages; + + rc = > HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);Allocating all the memory here seems sub-optimal.> + + if (rc < 0) + goto out_1; + + pfn = > PFN_DOWN(balloon_stats.hotplug_start_paddr + > balloon_stats.hotplug_size); + + for (i = 0; i < rc; ++i, ++pfn) { + > BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) && + > phys_to_machine_mapping_valid(pfn)); + set_phys_to_machine(pfn, > frame_list[i]); + } + + balloon_stats.hotplug_size += rc << > PAGE_SHIFT; + balloon_stats.current_pages += rc; + +out_1: + > spin_unlock_irqrestore(&balloon_lock, flags); + +out_0: + return rc < > 0 ? rc : rc != nr_pages; +} + +static void hotplug_allocated_memory(void)Why is this done separately from the reservation/allocation from xen?> { - unsigned long target = balloon_stats.target_pages; + int nid, ret; > + struct memory_block *mem; + unsigned long pfn, pfn_limit; + + nid = > memory_add_physaddr_to_nid(balloon_stats.hotplug_start_paddr);Is the entire reserved memory range guaranteed to be within one node? I see that this function has multiple definitions depending on a number of config settings. Do we care about what definition it has?> + + ret = add_registered_memory(nid, > balloon_stats.hotplug_start_paddr, + balloon_stats.hotplug_size); + + > if (ret) { + pr_err("%s: add_registered_memory: Memory hotplug failed: > %i\n", + __func__, ret); + goto error; + } + + if (xen_pv_domain()) { > + pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr); + pfn_limit = pfn > + (balloon_stats.hotplug_size >> PAGE_SHIFT); - target = min(target, - > balloon_stats.current_pages + - balloon_stats.balloon_low + - > balloon_stats.balloon_high); + for (; pfn < pfn_limit; ++pfn) + if > (!PageHighMem(pfn_to_page(pfn))) + > BUG_ON(HYPERVISOR_update_va_mapping( + (unsigned long)__va(pfn << > PAGE_SHIFT), + mfn_pte(pfn_to_mfn(pfn), PAGE_KERNEL), 0)); + } + + ret > = online_pages(PFN_DOWN(balloon_stats.hotplug_start_paddr), + > balloon_stats.hotplug_size >> PAGE_SHIFT);Are the pages available for allocation by the rest of the kernel from this point on? Which function allocates the actual page structures?> + + if (ret) { + pr_err("%s: online_pages: Failed: %i\n", __func__, > ret); + goto error; + } + + pfn = > PFN_DOWN(balloon_stats.hotplug_start_paddr); + pfn_limit = pfn + > (balloon_stats.hotplug_size >> PAGE_SHIFT); - return target; + for (; > pfn < pfn_limit; pfn += PAGES_PER_SECTION) { + mem = > find_memory_block(__pfn_to_section(pfn)); + BUG_ON(!mem); + > BUG_ON(!present_section_nr(mem->phys_index)); + > mutex_lock(&mem->state_mutex); + mem->state = MEM_ONLINE; + > mutex_unlock(&mem->state_mutex); + }What does this do? How is it different from what online_pages() does?> + + goto out; + +error: + balloon_stats.current_pages -= > balloon_stats.hotplug_size >> PAGE_SHIFT; + balloon_stats.target_pages > -= balloon_stats.hotplug_size >> PAGE_SHIFT; + +out: + > balloon_stats.hotplug_start_paddr = 0; + balloon_stats.hotplug_size = > 0; } +#else +static inline u64 is_memory_resource_reserved(void) +{ + > return 0; +} + +static inline int allocate_additional_memory(unsigned > long nr_pages) +{ + /* + * CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is not > set. + * balloon_stats.target_pages could not be bigger + * than > balloon_stats.current_pages because additional + * memory allocation > is not possible. + */ + balloon_stats.target_pages = > balloon_stats.current_pages; + + return 0; +} + +static inline void > hotplug_allocated_memory(void) +{ +} +#endif /* > CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */ static int > increase_reservation(unsigned long nr_pages) { @@ -236,7 +400,7 @@ > static int increase_reservation(unsigned long nr_pages) > set_phys_to_machine(pfn, frame_list[i]); /* Link back into the page > tables if not highmem. */ - if (pfn < max_low_pfn) { + if > (xen_pv_domain() && !PageHighMem(page)) { int ret; ret = > HYPERVISOR_update_va_mapping( (unsigned long)__va(pfn << PAGE_SHIFT), > @@ -286,7 +450,7 @@ static int decrease_reservation(unsigned long > nr_pages) scrub_page(page); - if (!PageHighMem(page)) { + if > (xen_pv_domain() && !PageHighMem(page)) { ret = > HYPERVISOR_update_va_mapping( (unsigned long)__va(pfn << PAGE_SHIFT), > __pte_ma(0), 0); @@ -334,9 +498,15 @@ static void > balloon_process(struct work_struct *work) mutex_lock(&balloon_mutex); > do { - credit = current_target() - balloon_stats.current_pages; - if > (credit > 0) - need_sleep = (increase_reservation(credit) != 0); + > credit = balloon_stats.target_pages - balloon_stats.current_pages; + + > if (credit > 0) { + if (balloon_stats.balloon_low || > balloon_stats.balloon_high) + need_sleep = > (increase_reservation(credit) != 0); + else + need_sleep = > (allocate_additional_memory(credit) != 0); + } + if (credit < 0) > need_sleep = (decrease_reservation(-credit) != 0); @@ -347,8 +517,10 > @@ static void balloon_process(struct work_struct *work) } while > ((credit != 0) && !need_sleep); /* Schedule more work if there is some > still to be done. */ - if (current_target() != > balloon_stats.current_pages) + if (balloon_stats.target_pages != > balloon_stats.current_pages) mod_timer(&balloon_timer, jiffies + HZ); > + else if (is_memory_resource_reserved()) + hotplug_allocated_memory();Why can''t this be done in allocate_additional_memory()?> mutex_unlock(&balloon_mutex); } @@ -405,17 +577,27 @@ static int > __init balloon_init(void) unsigned long pfn; struct page *page; - if > (!xen_pv_domain()) + if (!xen_domain()) return -ENODEV; > pr_info("xen_balloon: Initialising balloon driver.\n"); - > balloon_stats.current_pages = min(xen_start_info->nr_pages, max_pfn); > + if (xen_pv_domain()) + balloon_stats.current_pages = > min(xen_start_info->nr_pages, max_pfn); + else + > balloon_stats.current_pages = max_pfn; + balloon_stats.target_pages = > balloon_stats.current_pages; balloon_stats.balloon_low = 0; > balloon_stats.balloon_high = 0; balloon_stats.driver_pages = 0UL; > +#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG + balloon_stats.boot_max_pfn > = max_pfn; + balloon_stats.hotplug_start_paddr = 0; + > balloon_stats.hotplug_size = 0; +#endif + init_timer(&balloon_timer); > balloon_timer.data = 0; balloon_timer.function = balloon_alarm; @@ > -423,11 +605,12 @@ static int __init balloon_init(void) > register_balloon(&balloon_sysdev); /* Initialise the balloon with > excess memory space. */ - for (pfn = xen_start_info->nr_pages; pfn < > max_pfn; pfn++) { - page = pfn_to_page(pfn); - if > (!PageReserved(page)) - balloon_append(page); - } + if > (xen_pv_domain()) + for (pfn = xen_start_info->nr_pages; pfn < > max_pfn; pfn++) { + page = pfn_to_page(pfn); + if > (!PageReserved(page)) + balloon_append(page); + } > target_watch.callback = watch_target; xenstore_notifier.notifier_call > = balloon_init_watcher; diff --git a/include/linux/memory_hotplug.h > b/include/linux/memory_hotplug.h index 35b07b7..37f1894 100644 --- > a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h > @@ -202,6 +202,7 @@ static inline int > is_mem_section_removable(unsigned long pfn, } #endif /* > CONFIG_MEMORY_HOTREMOVE */ +extern int add_registered_memory(int nid, > u64 start, u64 size); extern int add_memory(int nid, u64 start, u64 > size); extern int arch_add_memory(int nid, u64 start, u64 size); > extern int remove_memory(u64 start, u64 size); diff --git > a/mm/memory_hotplug.c b/mm/memory_hotplug.c index be211a5..48a65bb > 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -481,22 > +481,13 @@ static void rollback_node_hotadd(int nid, pg_data_t *pgdat) > return; } - /* we are OK calling __meminit stuff here - we have > CONFIG_MEMORY_HOTPLUG */ -int __ref add_memory(int nid, u64 start, u64 > size) +static int __ref __add_memory(int nid, u64 start, u64 size) { > pg_data_t *pgdat = NULL; int new_pgdat = 0; - struct resource *res; > int ret; - lock_system_sleep(); - - res = > register_memory_resource(start, size); - ret = -EEXIST; - if (!res) - > goto out; - if (!node_online(nid)) { pgdat = hotadd_new_pgdat(nid, > start); ret = -ENOMEM; @@ -533,11 +524,45 @@ error: /* rollback pgdat > allocation and others */ if (new_pgdat) rollback_node_hotadd(nid, > pgdat); - if (res) - release_memory_resource(res); out: + return ret; > +} + +int __ref add_registered_memory(int nid, u64 start, u64 size) +{ > + int ret; + + lock_system_sleep(); + ret = __add_memory(nid, start, > size); unlock_system_sleep(); + + return ret; +} > +EXPORT_SYMBOL_GPL(add_registered_memory); + +int __ref add_memory(int > nid, u64 start, u64 size) +{ + int ret = -EEXIST; + struct resource > *res; + + lock_system_sleep(); + + res = > register_memory_resource(start, size); + + if (!res) + goto out; + + > ret = __add_memory(nid, start, size); + + if (!ret) + goto out; + + > release_memory_resource(res);In your earlier, patch I think you made the firmware_map_add_hotplug() be specific to add_memory, but now you have it in __add_memory. Does it make a difference either way?> + +out: + unlock_system_sleep(); + return ret; } > EXPORT_SYMBOL_GPL(add_memory);As before, this all looks reasonably good. I think the next steps should be: 1. identify how to incrementally allocate the memory from Xen, rather than doing it at hotplug time 2. identify how to disable the sysfs online interface for Xen hotplugged memory For 1., I think the code should be something like: increase_address_space(unsigned long pages) { - reserve resource for memory section - online section for each page in section { online page mark page structure allocated add page to ballooned_pages list balloon_stats.balloon_(low|high)++; } } The tricky part is making sure that the memory for the page structures has been populated so it can be used. Aside from that, there should be no need to have another call to HYPERVISOR_memory_op(XENMEM_populate_physmap, ...) aside from the existing one. Or to look at it another way, memory hotplug is the mechanism for increasing the amount of available physical address space, but ballooning is the way to increase the number of allocated pages. They are orthogonal. 2 requires a deeper understanding of the existing hotplug code. It needs to be refactored so that you can use the core hotplug machinery without enabling the sysfs page-onlining mechanism, while still leaving it available for physical hotplug. In the short term, having a boolean to disable the onlining mechanism is probably the pragmatic solution, so the balloon code can simply disable it. Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-13 00:48 UTC
Re: [Xen-devel] Re: [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
On 08/12/2010 05:46 PM, Jeremy Fitzhardinge wrote:>> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index >> fad3df2..4f35eaf 100644 --- a/drivers/xen/Kconfig +++ >> b/drivers/xen/Kconfig @@ -9,6 +9,16 @@ config XEN_BALLOON the system >> to expand the domain''s memory allocation, or alternatively return >> unneeded memory to the system. +config XEN_BALLOON_MEMORY_HOTPLUG + >> bool "Xen memory balloon driver with memory hotplug support" + >> default n + depends on XEN_BALLOON && MEMORY_HOTPLUG + help + Xen >> memory balloon driver with memory hotplug support allows expanding + >> memory available for the system above limit declared at system >> startup. + It is very useful on critical systems which require long >> run without + rebooting. + config XEN_SCRUB_PAGES bool "Scrub pages >> before returning them to system" depends on XEN_BALLOON diff --git >> a/drivers/xen/balloon.c b/drivers/xen/balloon.c indexGah, what a mess. Will repost later. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vasiliy G Tolstov
2010-Aug-13 05:49 UTC
Re: [Xen-devel] [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
В Птн, 13/08/2010 в 01:20 +0200, Daniel Kiper пишет:> Hi, > > On Thu, Aug 12, 2010 at 11:44:01AM +0400, Vasiliy G Tolstov wrote: > [...] > > Can You provide patch to sles 11 sp1 ? > > I found that sles has some modification to the kernel and patch does not > > apply cleanly =(. > > Could you send me address from which I could > download source of kernel used by SLES 11 SP1 ??? > > DanielThank You for try. link ftp://ftp.suse.com/pub/projects/kernel/kotd/SLE11-SP1/src/ -- Vasiliy G Tolstov <v.tolstov@selfip.ru> Selfip.Ru _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Kiper
2010-Aug-16 01:33 UTC
Re: [Xen-devel] [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
Hi, On Fri, Aug 13, 2010 at 09:49:52AM +0400, Vasiliy G Tolstov wrote: [...]> Thank You for try. > link ftp://ftp.suse.com/pub/projects/kernel/kotd/SLE11-SP1/src/Here is the patch which applies to Linux Kernel Ver. 2.6.32.19. (SLES kernel is based on 2.6.32 and patch applies with one warning). It compiles however I did not have time to test it. Daniel Signed-off-by: Daniel Kiper <dkiper@net-space.pl> --- diff -Npru linux-2.6.32.19.orig/drivers/xen/Kconfig linux-2.6.32.19/drivers/xen/Kconfig --- linux-2.6.32.19.orig/drivers/xen/Kconfig 2010-08-13 22:24:37.000000000 +0200 +++ linux-2.6.32.19/drivers/xen/Kconfig 2010-08-16 02:01:35.000000000 +0200 @@ -7,6 +7,16 @@ config XEN_BALLOON the system to expand the domain''s memory allocation, or alternatively return unneeded memory to the system. +config XEN_BALLOON_MEMORY_HOTPLUG + bool "Xen memory balloon driver with memory hotplug support" + default n + depends on XEN_BALLOON && MEMORY_HOTPLUG + help + Xen memory balloon driver with memory hotplug support allows expanding + memory available for the system above limit declared at system startup. + It is very useful on critical systems which require long run without + rebooting. + config XEN_SCRUB_PAGES bool "Scrub pages before returning them to system" depends on XEN_BALLOON diff -Npru linux-2.6.32.19.orig/drivers/xen/balloon.c linux-2.6.32.19/drivers/xen/balloon.c --- linux-2.6.32.19.orig/drivers/xen/balloon.c 2010-08-13 22:24:37.000000000 +0200 +++ linux-2.6.32.19/drivers/xen/balloon.c 2010-08-16 02:13:12.000000000 +0200 @@ -6,6 +6,7 @@ * Copyright (c) 2003, B Dragovic * Copyright (c) 2003-2004, M Williamson, K Fraser * Copyright (c) 2005 Dan M. Smith, IBM Corporation + * Copyright (c) 2010 Daniel Kiper * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version 2 @@ -43,6 +44,7 @@ #include <linux/mutex.h> #include <linux/list.h> #include <linux/sysdev.h> +#include <linux/memory.h> #include <asm/page.h> #include <asm/pgalloc.h> @@ -74,6 +76,11 @@ struct balloon_stats { /* Number of pages in high- and low-memory balloons. */ unsigned long balloon_low; unsigned long balloon_high; +#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG + unsigned long boot_max_pfn; + u64 hotplug_start_paddr; + u64 hotplug_size; +#endif }; static DEFINE_MUTEX(balloon_mutex); @@ -181,17 +188,173 @@ static void balloon_alarm(unsigned long schedule_work(&balloon_worker); } -static unsigned long current_target(void) +#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG +static inline u64 is_memory_resource_reserved(void) +{ + return balloon_stats.hotplug_start_paddr; +} + +static int allocate_additional_memory(unsigned long nr_pages) +{ + long rc; + resource_size_t r_min, r_size; + struct resource *r; + struct xen_memory_reservation reservation = { + .address_bits = 0, + .extent_order = 0, + .domid = DOMID_SELF + }; + unsigned long flags, i, pfn; + + if (nr_pages > ARRAY_SIZE(frame_list)) + nr_pages = ARRAY_SIZE(frame_list); + + if (!is_memory_resource_reserved()) { + + /* + * Look for first unused memory region starting at page + * boundary. Skip last memory section created at boot time + * becuase it may contains unused memory pages with PG_reserved + * bit not set (online_pages require PG_reserved bit set). + */ + + r = kzalloc(sizeof(struct resource), GFP_KERNEL); + + if (!r) { + rc = -ENOMEM; + goto out_0; + } + + r->name = "System RAM"; + r->flags = IORESOURCE_MEM | IORESOURCE_BUSY; + r_min = PFN_PHYS(section_nr_to_pfn(pfn_to_section_nr(balloon_stats.boot_max_pfn) + 1)); + r_size = (balloon_stats.target_pages - balloon_stats.current_pages) << PAGE_SHIFT; + + rc = allocate_resource(&iomem_resource, r, r_size, r_min, + ULONG_MAX, PAGE_SIZE, NULL, NULL); + + if (rc < 0) { + kfree(r); + goto out_0; + } + + balloon_stats.hotplug_start_paddr = r->start; + } + + spin_lock_irqsave(&balloon_lock, flags); + + pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr + balloon_stats.hotplug_size); + + for (i = 0; i < nr_pages; ++i, ++pfn) + frame_list[i] = pfn; + + set_xen_guest_handle(reservation.extent_start, frame_list); + reservation.nr_extents = nr_pages; + + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); + + if (rc < 0) + goto out_1; + + pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr + balloon_stats.hotplug_size); + + for (i = 0; i < rc; ++i, ++pfn) { + BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) && + phys_to_machine_mapping_valid(pfn)); + set_phys_to_machine(pfn, frame_list[i]); + } + + balloon_stats.hotplug_size += rc << PAGE_SHIFT; + balloon_stats.current_pages += rc; + +out_1: + spin_unlock_irqrestore(&balloon_lock, flags); + +out_0: + return rc < 0 ? rc : rc != nr_pages; +} + +static void hotplug_allocated_memory(void) +{ + int nid, ret; + struct memory_block *mem; + unsigned long pfn, pfn_limit; + + nid = memory_add_physaddr_to_nid(balloon_stats.hotplug_start_paddr); + + ret = add_registered_memory(nid, balloon_stats.hotplug_start_paddr, + balloon_stats.hotplug_size); + + if (ret) { + pr_err("%s: add_registered_memory: Memory hotplug failed: %i\n", + __func__, ret); + goto error; + } + + if (xen_pv_domain()) { + pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr); + pfn_limit = pfn + (balloon_stats.hotplug_size >> PAGE_SHIFT); + + for (; pfn < pfn_limit; ++pfn) + if (!PageHighMem(pfn_to_page(pfn))) + BUG_ON(HYPERVISOR_update_va_mapping( + (unsigned long)__va(pfn << PAGE_SHIFT), + mfn_pte(pfn_to_mfn(pfn), PAGE_KERNEL), 0)); + } + + ret = online_pages(PFN_DOWN(balloon_stats.hotplug_start_paddr), + balloon_stats.hotplug_size >> PAGE_SHIFT); + + if (ret) { + pr_err("%s: online_pages: Failed: %i\n", __func__, ret); + goto error; + } + + pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr); + pfn_limit = pfn + (balloon_stats.hotplug_size >> PAGE_SHIFT); + + for (; pfn < pfn_limit; pfn += PAGES_PER_SECTION) { + mem = find_memory_block(__pfn_to_section(pfn)); + BUG_ON(!mem); + BUG_ON(!present_section_nr(mem->phys_index)); + mutex_lock(&mem->state_mutex); + mem->state = MEM_ONLINE; + mutex_unlock(&mem->state_mutex); + } + + goto out; + +error: + balloon_stats.current_pages -= balloon_stats.hotplug_size >> PAGE_SHIFT; + balloon_stats.target_pages -= balloon_stats.hotplug_size >> PAGE_SHIFT; + +out: + balloon_stats.hotplug_start_paddr = 0; + balloon_stats.hotplug_size = 0; +} +#else +static inline u64 is_memory_resource_reserved(void) +{ + return 0; +} + +static inline int allocate_additional_memory(unsigned long nr_pages) { - unsigned long target = balloon_stats.target_pages; + /* + * CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is not set. + * balloon_stats.target_pages could not be bigger + * than balloon_stats.current_pages because additional + * memory allocation is not possible. + */ + balloon_stats.target_pages = balloon_stats.current_pages; - target = min(target, - balloon_stats.current_pages + - balloon_stats.balloon_low + - balloon_stats.balloon_high); + return 0; +} - return target; +static inline void hotplug_allocated_memory(void) +{ } +#endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */ static int increase_reservation(unsigned long nr_pages) { @@ -233,7 +396,7 @@ static int increase_reservation(unsigned set_phys_to_machine(pfn, frame_list[i]); /* Link back into the page tables if not highmem. */ - if (pfn < max_low_pfn) { + if (xen_pv_domain() && !PageHighMem(page)) { int ret; ret = HYPERVISOR_update_va_mapping( (unsigned long)__va(pfn << PAGE_SHIFT), @@ -283,7 +446,7 @@ static int decrease_reservation(unsigned scrub_page(page); - if (!PageHighMem(page)) { + if (xen_pv_domain() && !PageHighMem(page)) { ret = HYPERVISOR_update_va_mapping( (unsigned long)__va(pfn << PAGE_SHIFT), __pte_ma(0), 0); @@ -331,9 +494,15 @@ static void balloon_process(struct work_ mutex_lock(&balloon_mutex); do { - credit = current_target() - balloon_stats.current_pages; - if (credit > 0) - need_sleep = (increase_reservation(credit) != 0); + credit = balloon_stats.target_pages - balloon_stats.current_pages; + + if (credit > 0) { + if (balloon_stats.balloon_low || balloon_stats.balloon_high) + need_sleep = (increase_reservation(credit) != 0); + else + need_sleep = (allocate_additional_memory(credit) != 0); + } + if (credit < 0) need_sleep = (decrease_reservation(-credit) != 0); @@ -344,8 +513,10 @@ static void balloon_process(struct work_ } while ((credit != 0) && !need_sleep); /* Schedule more work if there is some still to be done. */ - if (current_target() != balloon_stats.current_pages) + if (balloon_stats.target_pages != balloon_stats.current_pages) mod_timer(&balloon_timer, jiffies + HZ); + else if (is_memory_resource_reserved()) + hotplug_allocated_memory(); mutex_unlock(&balloon_mutex); } @@ -402,17 +573,27 @@ static int __init balloon_init(void) unsigned long pfn; struct page *page; - if (!xen_pv_domain()) + if (!xen_domain()) return -ENODEV; pr_info("xen_balloon: Initialising balloon driver.\n"); - balloon_stats.current_pages = min(xen_start_info->nr_pages, max_pfn); + if (xen_pv_domain()) + balloon_stats.current_pages = min(xen_start_info->nr_pages, max_pfn); + else + balloon_stats.current_pages = max_pfn; + balloon_stats.target_pages = balloon_stats.current_pages; balloon_stats.balloon_low = 0; balloon_stats.balloon_high = 0; balloon_stats.driver_pages = 0UL; +#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG + balloon_stats.boot_max_pfn = max_pfn; + balloon_stats.hotplug_start_paddr = 0; + balloon_stats.hotplug_size = 0; +#endif + init_timer(&balloon_timer); balloon_timer.data = 0; balloon_timer.function = balloon_alarm; @@ -420,11 +601,12 @@ static int __init balloon_init(void) register_balloon(&balloon_sysdev); /* Initialise the balloon with excess memory space. */ - for (pfn = xen_start_info->nr_pages; pfn < max_pfn; pfn++) { - page = pfn_to_page(pfn); - if (!PageReserved(page)) - balloon_append(page); - } + if (xen_pv_domain()) + for (pfn = xen_start_info->nr_pages; pfn < max_pfn; pfn++) { + page = pfn_to_page(pfn); + if (!PageReserved(page)) + balloon_append(page); + } target_watch.callback = watch_target; xenstore_notifier.notifier_call = balloon_init_watcher; diff -Npru linux-2.6.32.19.orig/include/linux/memory_hotplug.h linux-2.6.32.19/include/linux/memory_hotplug.h --- linux-2.6.32.19.orig/include/linux/memory_hotplug.h 2010-08-13 22:24:37.000000000 +0200 +++ linux-2.6.32.19/include/linux/memory_hotplug.h 2010-08-16 02:01:35.000000000 +0200 @@ -203,6 +203,7 @@ static inline int is_mem_section_removab } #endif /* CONFIG_MEMORY_HOTREMOVE */ +extern int add_registered_memory(int nid, u64 start, u64 size); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); extern int remove_memory(u64 start, u64 size); diff -Npru linux-2.6.32.19.orig/mm/memory_hotplug.c linux-2.6.32.19/mm/memory_hotplug.c --- linux-2.6.32.19.orig/mm/memory_hotplug.c 2010-08-13 22:24:37.000000000 +0200 +++ linux-2.6.32.19/mm/memory_hotplug.c 2010-08-16 02:20:13.000000000 +0200 @@ -477,22 +477,13 @@ static void rollback_node_hotadd(int nid return; } - /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -int __ref add_memory(int nid, u64 start, u64 size) +static int __ref __add_memory(int nid, u64 start, u64 size) { pg_data_t *pgdat = NULL; int new_pgdat = 0; - struct resource *res; int ret; - lock_system_sleep(); - - res = register_memory_resource(start, size); - ret = -EEXIST; - if (!res) - goto out; - if (!node_online(nid)) { pgdat = hotadd_new_pgdat(nid, start); ret = -ENOMEM; @@ -523,14 +514,48 @@ int __ref add_memory(int nid, u64 start, goto out; error: - /* rollback pgdat allocation and others */ + /* rollback pgdat allocation */ if (new_pgdat) rollback_node_hotadd(nid, pgdat); - if (res) - release_memory_resource(res); out: + return ret; +} + +int add_registered_memory(int nid, u64 start, u64 size) +{ + int ret; + + lock_system_sleep(); + ret = __add_memory(nid, start, size); unlock_system_sleep(); + + return ret; +} +EXPORT_SYMBOL_GPL(add_registered_memory); + +int add_memory(int nid, u64 start, u64 size) +{ + int ret = -EEXIST; + struct resource *res; + + lock_system_sleep(); + + res = register_memory_resource(start, size); + + if (!res) + goto out; + + ret = __add_memory(nid, start, size); + + if (!ret) + goto out; + + release_memory_resource(res); + +out: + unlock_system_sleep(); + return ret; } EXPORT_SYMBOL_GPL(add_memory); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vasiliy G Tolstov
2010-Aug-16 06:32 UTC
Re: [Xen-devel] [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
В Пнд, 16/08/2010 в 03:33 +0200, Daniel Kiper пишет:> Hi, > > On Fri, Aug 13, 2010 at 09:49:52AM +0400, Vasiliy G Tolstov wrote: > [...] > > Thank You for try. > > link ftp://ftp.suse.com/pub/projects/kernel/kotd/SLE11-SP1/src/ > > Here is the patch which applies to Linux Kernel Ver. 2.6.32.19. > (SLES kernel is based on 2.6.32 and patch applies with one > warning). It compiles however I did not have time to test it. > > Daniel >Thank You very much! I''m try it on this week. -- Vasiliy G Tolstov <v.tolstov@selfip.ru> Selfip.Ru _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Kiper
2010-Aug-16 09:23 UTC
Re: [Xen-devel] [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
Hi, On Fri, Aug 13, 2010 at 03:19:59AM +0300, Pasi K?rkk?inen wrote: [...]> > Could you send me address from which I could > > download source of kernel used by SLES 11 SP1 ??? > > See: http://wiki.xensource.com/xenwiki/XenDom0KernelsThx. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Kiper
2010-Aug-16 15:44 UTC
[Xen-devel] Re: [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
Hi, On Thu, Aug 12, 2010 at 05:43:33PM -0700, Jeremy Fitzhardinge wrote:> On 08/11/2010 06:22 PM, Daniel Kiper wrote: > >>Overall, this looks much better. The next step is to split this into at > >>least two patches: one for the core code, and one for the Xen bits. > >>Each patch should do just one logical operation, so if you have several > >>distinct changes to the core code, put them in separate patches. > >I will do that if this patch will be accepted. > > First step is to post it to lkml for discussion, cc:ing the relevant > maintainers. (I''m not really sure who that is at the moment. It will > take some digging around in the history.)I took all relevant addresses (sorry if I missed somebody) from MAINTAINERS file and they are in To in most of e-mails from me.> >>Can you find a clean way to prevent/disable ARCH_MEMORY_PROBE at runtime > >>when in a Xen context? > >There is no simple way to do that. It requiers to do some > >changes in drivers/base/memory.c code. I think it should > >be done as kernel boot option (on by default to not break > >things using this interface now). If it be useful for maintainers > >of mm/memory_hotplug.c and drivers/base/memory.c code then > >I could do that. Currently original arch/x86/Kconfig version > >is restored. > > I think adding a global flag which the Xen balloon driver can disable > should be sufficient. There''s no need to make an separate user-settable > control.OK.> >>>+/* we are OK calling __meminit stuff here - we have > >>>CONFIG_MEMORY_HOTPLUG > >>>*/ > >>>+static int __ref xen_add_memory(int nid, u64 start, u64 size) > >>Could this be __meminit too then? > >Good question. I looked throught the code and could > >not find any simple explanation why mm/memory_hotplug.c > >authors used __ref instead __meminit. Could you (mm/memory_hotplug.c > >authors/maintainers) tell us why ??? > > Quite possibly a left-over from something else. You could just try > making it __meminit, then compile with, erm, the option which shows you > section conflicts (it shows the number of conflicts at the end of the > kernel build by default, and tells you how to explicitly list them).Small reminder: make CONFIG_DEBUG_SECTION_MISMATCH=y I reviewed kernel source code once again. It is OK. Normaly it is not allowed to reference code/data tagged as .init.* because that sections are freed at the end of kernel boot sequence and they do not exists any more in memory. However it is sometimes required to use code/data marked .init.*. To allow that __ref tag is used and then referenced objects are not removed from memory (and no warnings are displayed during kernel compilation).> >>What''s this for? I see all its other users are in the memory hotplug > >>code, but presumably they''re concerned about a real S3 suspend. Do we > >>care about that here? > >Yes, because as I know S3 state is supported by Xen guests. > > Yes, but I''m assuming the interaction between S3 and ACPI hotplug memory > isn''t something that concerns a Xen guest; our hotplug mechanism is > completely different.Suspend/Hibernation code in Linux Kernel is platform independent to some extent and it does not require ACPI. It means that lock_system_sleep/unlock_system_sleep is required in that place to have memory state intact during suspend/hibernation.> >>>+ r->name = "System RAM"; > >>How about making it clear its Xen hotplug RAM? Or do things care about > >>the "System RAM" name? > >As I know no however as I saw anybody do not differentiate between > >normal and hotplugged memory. I thought about that ealier however > >stated that this soultion does not give us any real gain. That is why > >I decided to use standard name for hotplugged memory. > > Its cosmetic, but it would be useful to see what''s going on.If you wish I will do that, however then it should be changed as well add_registered_memory() function syntax. It should contain pointer to name published through /sys/firmware/memmap interface. I am not sure it is good solution to change add_registered_memory() function syntax which I think should be same as add_memory() function syntax.> >+#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG + unsigned long > >boot_max_pfn; + u64 hotplug_start_paddr; + u64 hotplug_size; > > So does this mean you only support adding a single hotplug region? What > happens if your initial increase wasn''t enough and you want to add > more? Would you make this a list of hot-added memory or something? > > But I''m not even quite sure why you need to keep this as global data.No. It supports multiple allocations. This variables are used mostly for communication between allocate_additional_memory and hotplug_allocated_memory functions.> >PFN_PHYS(section_nr_to_pfn(pfn_to_section_nr(balloon_stats.boot_max_pfn) > >+ 1)); + r_size = (balloon_stats.target_pages - > >balloon_stats.current_pages) << PAGE_SHIFT; > > So this just reserves enough resource to satisfy the current outstanding > requirement? That''s OK if we can repeat it, but it looks like it will > only do this once?For full description of current algorithm please look at the end of this e-mail.> >+ + rc = allocate_resource(&iomem_resource, r, r_size, r_min, + > >ULONG_MAX, PAGE_SIZE, NULL, NULL); > > Does this need to be section aligned, with a section size? Or is any old > place OK?It is always PAGE_SIZE aligned and not below than <max_address_of_section_allocated_at_boot> + 1.> >reservation.nr_extents = nr_pages; + + rc > >HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); > > Allocating all the memory here seems sub-optimal.Here is allocated min(nr_pages, ARRAY_SIZE(frame_list)) of pages.> >0 ? rc : rc != nr_pages; +} + +static void hotplug_allocated_memory(void) > > Why is this done separately from the reservation/allocation from xen?First memory is allocated in batches of min(nr_pages, ARRAY_SIZE(frame_list)) of pages and then whole allocated memory is hotplugged.> >{ - unsigned long target = balloon_stats.target_pages; + int nid, ret; > >+ struct memory_block *mem; + unsigned long pfn, pfn_limit; + + nid > >memory_add_physaddr_to_nid(balloon_stats.hotplug_start_paddr); > > Is the entire reserved memory range guaranteed to be within one node? > > I see that this function has multiple definitions depending on a number > of config settings. Do we care about what definition it has?As I know (maybe I have missed something) currently Xen does not support NUMA in guests and nid is always 0. However maybe it will be good to create Xen specific version of memory_add_physaddr_to_nid function.> >= online_pages(PFN_DOWN(balloon_stats.hotplug_start_paddr), + > >balloon_stats.hotplug_size >> PAGE_SHIFT); > > Are the pages available for allocation by the rest of the kernel from > this point on?Yes.> Which function allocates the actual page structures?add_registered_memory()> > >+ + if (ret) { + pr_err("%s: online_pages: Failed: %i\n", __func__, > >ret); + goto error; + } + + pfn > >PFN_DOWN(balloon_stats.hotplug_start_paddr); + pfn_limit = pfn + > >(balloon_stats.hotplug_size >> PAGE_SHIFT); - return target; + for (; > >pfn < pfn_limit; pfn += PAGES_PER_SECTION) { + mem > >find_memory_block(__pfn_to_section(pfn)); + BUG_ON(!mem); + > >BUG_ON(!present_section_nr(mem->phys_index)); + > >mutex_lock(&mem->state_mutex); + mem->state = MEM_ONLINE; + > >mutex_unlock(&mem->state_mutex); + } > > What does this do? How is it different from what online_pages() does?This updates /sys/devices/system/memory/memory*/state files which contain information about states of sections.> >+ else if (is_memory_resource_reserved()) + hotplug_allocated_memory(); > > Why can''t this be done in allocate_additional_memory()?Because memory is allocated in relatively small batches and then whole memory is hotplugged.> In your earlier, patch I think you made the firmware_map_add_hotplug() > be specific to add_memory, but now you have it in __add_memory. Does it > make a difference either way?It was not available in Linux Kernel Ver. 2.6.32.* on which based first versions of this patch. It updates /sys/firmware/memmap.> As before, this all looks reasonably good. I think the next steps > should be: > > 1. identify how to incrementally allocate the memory from Xen, rather > than doing it at hotplug time > 2. identify how to disable the sysfs online interface for Xen > hotplugged memory > > For 1., I think the code should be something like: > > increase_address_space(unsigned long pages) > { > - reserve resource for memory section > - online section > for each page in section { > online page > mark page structure allocated > add page to ballooned_pages list > balloon_stats.balloon_(low|high)++; > } > }Here is current algorithm: - allocate_resource() with size requested by user, - allocate memory in relatively small batches, - add_registered_memory(), - online_pages(), - update /sys/devices/system/memory/memory*/state files.> The tricky part is making sure that the memory for the page structures > has been populated so it can be used. Aside from that, there should be > no need to have another call to > HYPERVISOR_memory_op(XENMEM_populate_physmap, ...) aside from the > existing one.Currently it is.> 2 requires a deeper understanding of the existing hotplug code. It > needs to be refactored so that you can use the core hotplug machinery > without enabling the sysfs page-onlining mechanism, while still leaving > it available for physical hotplug. In the short term, having a boolean > to disable the onlining mechanism is probably the pragmatic solution, so > the balloon code can simply disable it.I think that sysfs should stay intact because it contains some useful information for admins. We should reconsider avaibilty of /sys/devices/system/memory/probe. In physical systems it is available however usage without real hotplug support lead to big crash. I am not sure we should disable probe in Xen. Maybe it is better to stay in sync with standard behavior. Second solution is to prepare an interface (kernel option or only some enable/disable functions) which give possibilty to enable/disable probe interface when it is required. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-25 21:33 UTC
[Xen-devel] Re: [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
On 08/16/2010 08:44 AM, Daniel Kiper wrote:> Hi, > > On Thu, Aug 12, 2010 at 05:43:33PM -0700, Jeremy Fitzhardinge wrote: >> On 08/11/2010 06:22 PM, Daniel Kiper wrote: >>>> Overall, this looks much better. The next step is to split this into at >>>> least two patches: one for the core code, and one for the Xen bits. >>>> Each patch should do just one logical operation, so if you have several >>>> distinct changes to the core code, put them in separate patches. >>> I will do that if this patch will be accepted. >> First step is to post it to lkml for discussion, cc:ing the relevant >> maintainers. (I''m not really sure who that is at the moment. It will >> take some digging around in the history.) > I took all relevant addresses (sorry if I missed somebody) from MAINTAINERS > file and they are in To in most of e-mails from me.Unfortunately MAINTAINERS is often poorly maintained. While you should include those addresses, its also worth looking at the git history to see who has been active in that area recently.>>>> Can you find a clean way to prevent/disable ARCH_MEMORY_PROBE at runtime >>>> when in a Xen context? >>> There is no simple way to do that. It requiers to do some >>> changes in drivers/base/memory.c code. I think it should >>> be done as kernel boot option (on by default to not break >>> things using this interface now). If it be useful for maintainers >>> of mm/memory_hotplug.c and drivers/base/memory.c code then >>> I could do that. Currently original arch/x86/Kconfig version >>> is restored. >> I think adding a global flag which the Xen balloon driver can disable >> should be sufficient. There''s no need to make an separate user-settable >> control. > OK. > >>>>> +/* we are OK calling __meminit stuff here - we have >>>>> CONFIG_MEMORY_HOTPLUG >>>>> */ >>>>> +static int __ref xen_add_memory(int nid, u64 start, u64 size) >>>> Could this be __meminit too then? >>> Good question. I looked throught the code and could >>> not find any simple explanation why mm/memory_hotplug.c >>> authors used __ref instead __meminit. Could you (mm/memory_hotplug.c >>> authors/maintainers) tell us why ??? >> Quite possibly a left-over from something else. You could just try >> making it __meminit, then compile with, erm, the option which shows you >> section conflicts (it shows the number of conflicts at the end of the >> kernel build by default, and tells you how to explicitly list them). > Small reminder: make CONFIG_DEBUG_SECTION_MISMATCH=y > > I reviewed kernel source code once again. It is OK. Normaly it is > not allowed to reference code/data tagged as .init.* because > that sections are freed at the end of kernel boot sequence and > they do not exists any more in memory. However it is sometimes > required to use code/data marked .init.*. To allow that __ref > tag is used and then referenced objects are not removed from > memory (and no warnings are displayed during kernel compilation). > >>>> What''s this for? I see all its other users are in the memory hotplug >>>> code, but presumably they''re concerned about a real S3 suspend. Do we >>>> care about that here? >>> Yes, because as I know S3 state is supported by Xen guests. >> Yes, but I''m assuming the interaction between S3 and ACPI hotplug memory >> isn''t something that concerns a Xen guest; our hotplug mechanism is >> completely different. > Suspend/Hibernation code in Linux Kernel is platform independent > to some extent and it does not require ACPI. It means that > lock_system_sleep/unlock_system_sleep is required in that > place to have memory state intact during suspend/hibernation.My question is more along the lines of whether there''s an *inherent* dependency/interaction between suspend/hibernate and hotplug memory, or whether the interaction is a side-effect of the x86 implementation. But it doesn''t really matter either way for our purposes.>>>>> + r->name = "System RAM"; >>>> How about making it clear its Xen hotplug RAM? Or do things care about >>>> the "System RAM" name? >>> As I know no however as I saw anybody do not differentiate between >>> normal and hotplugged memory. I thought about that ealier however >>> stated that this soultion does not give us any real gain. That is why >>> I decided to use standard name for hotplugged memory. >> Its cosmetic, but it would be useful to see what''s going on. > If you wish I will do that, however then it should be changed > as well add_registered_memory() function syntax. It should > contain pointer to name published through /sys/firmware/memmap > interface. I am not sure it is good solution to change > add_registered_memory() function syntax which I think should be > same as add_memory() function syntax.OK, fair enough.>>> +#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG + unsigned long >>> boot_max_pfn; + u64 hotplug_start_paddr; + u64 hotplug_size; >> So does this mean you only support adding a single hotplug region? What >> happens if your initial increase wasn''t enough and you want to add >> more? Would you make this a list of hot-added memory or something? >> >> But I''m not even quite sure why you need to keep this as global data. > No. It supports multiple allocations. This variables are > used mostly for communication between allocate_additional_memory > and hotplug_allocated_memory functions.Using globals to communicate values between two functions is not generally good practice. Could the code be restructured to avoid it (by passing them as parameters, for example)?>>> + else if (is_memory_resource_reserved()) + hotplug_allocated_memory(); >> Why can''t this be done in allocate_additional_memory()? > Because memory is allocated in relatively small > batches and then whole memory is hotplugged.Is the batch allocation just to avoid having a single great big piece, or is there some other deeper reason? If not, I don''t see why that detail can''t be hidden in an inner loop.> Here is current algorithm: > - allocate_resource() with size requested by user, > - allocate memory in relatively small batches, > - add_registered_memory(), > - online_pages(), > - update /sys/devices/system/memory/memory*/state files.That''s OK as far as it goes, but I do tend to see memory hotplug as an underlying implementation detail rather than something that should be directly exposed to users (ie memory hotplug as the mechanism to allow ballooning to expand beyond the initial domain size).>> The tricky part is making sure that the memory for the page structures >> has been populated so it can be used. Aside from that, there should be >> no need to have another call to >> HYPERVISOR_memory_op(XENMEM_populate_physmap, ...) aside from the >> existing one. > Currently it is. > >> 2 requires a deeper understanding of the existing hotplug code. It >> needs to be refactored so that you can use the core hotplug machinery >> without enabling the sysfs page-onlining mechanism, while still leaving >> it available for physical hotplug. In the short term, having a boolean >> to disable the onlining mechanism is probably the pragmatic solution, so >> the balloon code can simply disable it. > I think that sysfs should stay intact because it contains some > useful information for admins. We should reconsider avaibilty > of /sys/devices/system/memory/probe. In physical systems it > is available however usage without real hotplug support > lead to big crash. I am not sure we should disable probe in Xen. > Maybe it is better to stay in sync with standard behavior. > Second solution is to prepare an interface (kernel option > or only some enable/disable functions) which give possibilty > to enable/disable probe interface when it is required.My understanding is that on systems with real physical hotplug memory, the process is: 1. you insert/enable a DIMM or whatever to make the memory electrically active 2. the kernel notices this and generates a udev event 3. a usermode script sees this and, according to whatever policy it wants to implement, choose to online the memory at some point I''m concerned that if we partially implement this but leave "online" as a timebomb then existing installs with hotplug scripts in place may poke at it - thinking they''re dealing with physical hotplug - and cause problems. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
KAMEZAWA Hiroyuki
2010-Aug-26 01:23 UTC
[Xen-devel] Re: [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
On Wed, 25 Aug 2010 14:33:06 -0700 Jeremy Fitzhardinge <jeremy@goop.org> wrote:> >> 2 requires a deeper understanding of the existing hotplug code. It > >> needs to be refactored so that you can use the core hotplug machinery > >> without enabling the sysfs page-onlining mechanism, while still leaving > >> it available for physical hotplug. In the short term, having a boolean > >> to disable the onlining mechanism is probably the pragmatic solution, so > >> the balloon code can simply disable it. > > I think that sysfs should stay intact because it contains some > > useful information for admins. We should reconsider avaibilty > > of /sys/devices/system/memory/probe. In physical systems it > > is available however usage without real hotplug support > > lead to big crash. I am not sure we should disable probe in Xen. > > Maybe it is better to stay in sync with standard behavior. > > Second solution is to prepare an interface (kernel option > > or only some enable/disable functions) which give possibilty > > to enable/disable probe interface when it is required. > > My understanding is that on systems with real physical hotplug memory, > the process is: > > 1. you insert/enable a DIMM or whatever to make the memory > electrically active > 2. the kernel notices this and generates a udev event > 3. a usermode script sees this and, according to whatever policy it > wants to implement, choose to online the memory at some point > > I''m concerned that if we partially implement this but leave "online" as > a timebomb then existing installs with hotplug scripts in place may poke > at it - thinking they''re dealing with physical hotplug - and cause problems. >IIUC, IBM guys, using LPAR?, does memory hotplug on VM. The operation is. 1. tell the region of memory to be added to a userland daemon. 2. The daemon write 0xXXXXXX > /sys/devices/system/memory/probe (This notifies that memory is added physically.) Here, memory is created. 3. Then, online memory. I think VM guys can use similar method rather than simulating phyiscal hotplug. Then, you don''t have to worry about udev etc... No ? Thanks, -Kame _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Apparently Analagous Threads
- [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - second fully working version - once again
- [PATCH v8, part3 12/14] mm: correctly update zone->mamaged_pages
- [PATCH v8, part3 12/14] mm: correctly update zone->mamaged_pages
- [PATCH v8, part3 12/14] mm: correctly update zone->mamaged_pages
- [PATCH R3 4/7] xen/balloon: Migration from mod_timer() to schedule_delayed_work()