Dan Magenheimer
2011-Jun-06 22:12 UTC
[Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
Hi Konrad (and anyone else interested in reviewing) -- Here is the patch for self-ballooning and frontswap-selfshrinking on top of 2.6.39, and on top of the frontswap commit-set you recently merged into your tree. Since this is the first time I''ve ported this code to a kernel later than 2.6.34(?), and since this code hasn''t gotten formal review even though it''s been floating about in various forms for 18 months, I thought I would post it publicly for review, rather than just ask for a pull. (I''ll put it in a git tree after a round or two of feedback.) This code complements the cleancache and frontswap patchsets to optimize support for Xen Transcendent Memory. The policy it implements is rudimentary, so will almost certainly need to improve over time. (Sorry if this line-wraps... attached also...) Thanks, Dan P.S. Anybody who actually uses this patch is encouraged to read http://oss.oracle.com/projects/tmem/dist/documentation/internals/linuxpatch from which the following brief documentation was extracted, which will probably be the git commit comment. [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking Selfballooning creates memory pressure by using the Xen balloon driver to decrease and increase available kernel memory, tracking a target value of "Committed_AS" (see /proc/meminfo). Sysfs tunables are provided to adjust the frequency at which memory size is adjusted, the rate at which ballooning is used to approach the target, and to disable selfballooning completely. As of 100827, selfballooning runs in a separate kernel thread, "selfballooning". Frontswap shrinking accounts for the fact that pages swapped to disk may sit on disk for a very long time, even if very rarely used or if the kernel knows they will never be used again. This is because the kernel need not reclaim disk space because the disk space costs very little and can be overwritten when necessary. When the same pages are in frontswap, however, they are taking up valuable RAM real estate. So when frontswap activity is otherwise stable and the guest kernel is not under memory pressure, the frontswap shrinker removes some pages from frontswap and returns them to kernel memory. Sysfs tunables are provided to adjust the frequency of shrinking opportunities and the shrinking rate, and to create more "inertia". Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> drivers/xen/Kconfig | 10 +++ drivers/xen/balloon.c | 109 ++++++++++++++++++++++++++ drivers/xen/tmem.c | 1 + drivers/xen/xen-balloon.c | 191 ++++++++++++++++++++++++++++++++++++++++++++- include/xen/balloon.h | 11 +++ mm/mmap.c | 7 ++ 6 files changed, 328 insertions(+), 1 deletions(-) == diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index d589fe7..8f21fad 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -9,6 +9,16 @@ config XEN_BALLOON the system to expand the domain''s memory allocation, or alternatively return unneeded memory to the system. +config XEN_SELFBALLOONING + bool "dynamically self-balloon kernel memory to target" + depends on XEN && XEN_BALLOON && CLEANCACHE + default y + help + Self-ballooning dynamically balloons available kernel memory driven + by the current usage of anonymous memory ("committed AS") and + controlled by various sysfs-settable parameters. May be overridden + by the noselfballooning kernel boot parameter + config XEN_SCRUB_PAGES bool "Scrub pages before returning them to system" depends on XEN_BALLOON diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index f54290b..a35b056 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -56,6 +56,7 @@ #include <xen/balloon.h> #include <xen/features.h> #include <xen/page.h> +#include <linux/frontswap.h> /* * balloon_process() state: @@ -74,6 +75,14 @@ enum bp_state { static DEFINE_MUTEX(balloon_mutex); +#ifdef CONFIG_XEN_SELFBALLOONING +extern int tmem_enabled; +static int use_selfballooning __read_mostly = 1; +#ifdef CONFIG_FRONTSWAP +static int use_frontswap_selfshrink __read_mostly = 1; +#endif +#endif + struct balloon_stats balloon_stats; EXPORT_SYMBOL_GPL(balloon_stats); @@ -94,6 +103,10 @@ static LIST_HEAD(ballooned_pages); /* Main work function, always executed in process context. */ static void balloon_process(struct work_struct *work); static DECLARE_DELAYED_WORK(balloon_worker, balloon_process); +#ifdef CONFIG_XEN_SELFBALLOONING +static void selfballoon_process(struct work_struct *work); +DECLARE_DELAYED_WORK(selfballoon_worker, selfballoon_process); +#endif /* When ballooning out (allocating memory to return to Xen) we don''t really want the kernel to try too hard since that can trigger the oom killer. */ @@ -428,6 +441,86 @@ void free_xenballooned_pages(int nr_pages, struct page** pages) } EXPORT_SYMBOL(free_xenballooned_pages); +#ifdef CONFIG_XEN_SELFBALLOONING +#ifdef CONFIG_FRONTSWAP +#define frontswap_selfshrinking_enabled balloon_stats.frontswap_selfshrinking + +unsigned long frontswap_inertia_counter = 0; + +static void frontswap_selfshrink(void) +{ + static unsigned long cur_frontswap_pages = 0; + static unsigned long last_frontswap_pages = 0; + static unsigned long tgt_frontswap_pages = 0; + + if (!balloon_stats.frontswap_selfshrinking) + return; + if (!balloon_stats.frontswap_hysteresis) + return; + last_frontswap_pages = cur_frontswap_pages; + cur_frontswap_pages = frontswap_curr_pages(); + if (!cur_frontswap_pages || (cur_frontswap_pages > last_frontswap_pages)) { + frontswap_inertia_counter = balloon_stats.frontswap_inertia; + return; + } + if (frontswap_inertia_counter && --frontswap_inertia_counter) + return; + //frontswap_inertia_counter = balloon_stats.frontswap_inertia; + if ( cur_frontswap_pages <= balloon_stats.frontswap_hysteresis) + tgt_frontswap_pages = 0; + else tgt_frontswap_pages = cur_frontswap_pages - + (cur_frontswap_pages / balloon_stats.frontswap_hysteresis); + frontswap_shrink(tgt_frontswap_pages); +} + +static int __init no_frontswap_selfshrink_setup(char *s) +{ + use_frontswap_selfshrink = 0; + return 1; +} + +__setup("noselfshrink", no_frontswap_selfshrink_setup); +#else +#define frontswap_selfshrink() do {} while (0) +#define frontswap_selfshrinking_enabled 0 +#endif + +static void selfballoon_process(struct work_struct *work) +{ + extern unsigned long vm_get_committed_as(void); + unsigned long cur_pages, goal_pages, tgt_pages; + int reset_timer = 0; + + if (balloon_stats.selfballooning_enabled) { + tgt_pages = cur_pages = totalram_pages; + goal_pages = vm_get_committed_as(); + if (cur_pages > goal_pages) + tgt_pages = cur_pages - + (cur_pages - goal_pages) / balloon_stats.selfballoon_downhysteresis; + else if (cur_pages < goal_pages) + tgt_pages = cur_pages + + (goal_pages - cur_pages) / balloon_stats.selfballoon_uphysteresis; + balloon_set_new_target(tgt_pages); + reset_timer = 1; + } + if (frontswap_selfshrinking_enabled) { + frontswap_selfshrink(); + reset_timer = 1; + } + if (reset_timer) + schedule_delayed_work(&selfballoon_worker, + balloon_stats.selfballoon_interval * HZ); +} + +static int __init noselfballooning_setup(char *s) +{ + use_selfballooning = 0; + return 1; +} + +__setup("noselfballooning", noselfballooning_setup); +#endif + static int __init balloon_init(void) { unsigned long pfn, extra_pfn_end; @@ -468,6 +561,22 @@ static int __init balloon_init(void) __balloon_append(page); } +#ifdef CONFIG_XEN_SELFBALLOONING + balloon_stats.selfballooning_enabled = tmem_enabled && + use_selfballooning; + balloon_stats.selfballoon_interval = 5; + balloon_stats.selfballoon_downhysteresis = 8; + balloon_stats.selfballoon_uphysteresis = 1; +#ifdef CONFIG_FRONTSWAP + balloon_stats.frontswap_selfshrinking + use_frontswap_selfshrink && frontswap_enabled; + balloon_stats.frontswap_hysteresis = 20; + balloon_stats.frontswap_inertia = 3; +#endif + schedule_delayed_work(&selfballoon_worker, + balloon_stats.selfballoon_interval * HZ); +#endif + return 0; } diff --git a/drivers/xen/tmem.c b/drivers/xen/tmem.c index ad6a8f4..de740d2 100644 --- a/drivers/xen/tmem.c +++ b/drivers/xen/tmem.c @@ -124,6 +124,7 @@ static int xen_tmem_flush_object(u32 pool_id, struct tmem_oid oid) } int tmem_enabled; +EXPORT_SYMBOL(tmem_enabled); static int __init enable_tmem(char *s) { diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c index a4ff225..5e30fc9 100644 --- a/drivers/xen/xen-balloon.c +++ b/drivers/xen/xen-balloon.c @@ -190,6 +190,184 @@ static ssize_t store_target(struct sys_device *dev, static SYSDEV_ATTR(target, S_IRUGO | S_IWUSR, show_target, store_target); +#ifdef CONFIG_XEN_SELFBALLOONING +extern struct delayed_work selfballoon_worker; + +static ssize_t show_selfballooning(struct sys_device *dev, struct sysdev_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", balloon_stats.selfballooning_enabled); +} + +static ssize_t store_selfballooning(struct sys_device *dev, + struct sysdev_attribute *attr, + const char *buf, + size_t count) +{ + char *endchar; + int was_enabled = balloon_stats.selfballooning_enabled; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + balloon_stats.selfballooning_enabled = !!memparse(buf, &endchar); + + if (!was_enabled && balloon_stats.selfballooning_enabled) + schedule_delayed_work(&selfballoon_worker, + balloon_stats.selfballoon_interval * HZ); + + return count; +} + +static SYSDEV_ATTR(selfballooning, S_IRUGO | S_IWUSR, + show_selfballooning, store_selfballooning); + +static ssize_t show_selfballoon_interval(struct sys_device *dev, struct sysdev_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", balloon_stats.selfballoon_interval); +} + +static ssize_t store_selfballoon_interval(struct sys_device *dev, + struct sysdev_attribute *attr, + const char *buf, + size_t count) +{ + char *endchar; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + balloon_stats.selfballoon_interval = memparse(buf, &endchar); + return count; +} + +static SYSDEV_ATTR(selfballoon_interval, S_IRUGO | S_IWUSR, + show_selfballoon_interval, store_selfballoon_interval); + +static ssize_t show_selfballoon_downhys(struct sys_device *dev, struct sysdev_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", balloon_stats.selfballoon_downhysteresis); +} + +static ssize_t store_selfballoon_downhys(struct sys_device *dev, + struct sysdev_attribute *attr, + const char *buf, + size_t count) +{ + char *endchar; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + balloon_stats.selfballoon_downhysteresis = memparse(buf, &endchar); + return count; +} + +static SYSDEV_ATTR(selfballoon_downhysteresis, S_IRUGO | S_IWUSR, + show_selfballoon_downhys, store_selfballoon_downhys); + + +static ssize_t show_selfballoon_uphys(struct sys_device *dev, struct sysdev_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", balloon_stats.selfballoon_uphysteresis); +} + +static ssize_t store_selfballoon_uphys(struct sys_device *dev, + struct sysdev_attribute *attr, + const char *buf, + size_t count) +{ + char *endchar; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + balloon_stats.selfballoon_uphysteresis = memparse(buf, &endchar); + return count; +} + +static SYSDEV_ATTR(selfballoon_uphysteresis, S_IRUGO | S_IWUSR, + show_selfballoon_uphys, store_selfballoon_uphys); + +#ifdef CONFIG_FRONTSWAP +static ssize_t show_frontswap_selfshrinking(struct sys_device *dev, struct sysdev_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", balloon_stats.frontswap_selfshrinking); +} + +static ssize_t store_frontswap_selfshrinking(struct sys_device *dev, + struct sysdev_attribute *attr, + const char *buf, + size_t count) +{ + char *endchar; + int was_enabled = balloon_stats.frontswap_selfshrinking; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + balloon_stats.frontswap_selfshrinking = !!memparse(buf, &endchar); + + if (!was_enabled && !balloon_stats.selfballooning_enabled && + balloon_stats.frontswap_selfshrinking) + schedule_delayed_work(&selfballoon_worker, + balloon_stats.selfballoon_interval * HZ); + + return count; +} + +static SYSDEV_ATTR(frontswap_selfshrinking, S_IRUGO | S_IWUSR, + show_frontswap_selfshrinking, store_frontswap_selfshrinking); + +static ssize_t show_frontswap_inertia(struct sys_device *dev, struct sysdev_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", balloon_stats.frontswap_inertia); +} + +static ssize_t store_frontswap_inertia(struct sys_device *dev, + struct sysdev_attribute *attr, + const char *buf, + size_t count) +{ + char *endchar; + extern unsigned long frontswap_inertia_counter; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + balloon_stats.frontswap_inertia = memparse(buf, &endchar); + frontswap_inertia_counter = balloon_stats.frontswap_inertia; + return count; +} + +static SYSDEV_ATTR(frontswap_inertia, S_IRUGO | S_IWUSR, + show_frontswap_inertia, store_frontswap_inertia); + +static ssize_t show_frontswap_hysteresis(struct sys_device *dev, struct sysdev_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", balloon_stats.frontswap_hysteresis); +} + +static ssize_t store_frontswap_hysteresis(struct sys_device *dev, + struct sysdev_attribute *attr, + const char *buf, + size_t count) +{ + char *endchar; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + balloon_stats.frontswap_hysteresis = memparse(buf, &endchar); + return count; +} + +static SYSDEV_ATTR(frontswap_hysteresis, S_IRUGO | S_IWUSR, + show_frontswap_hysteresis, store_frontswap_hysteresis); + +#endif /* CONFIG_FRONTSWAP */ +#endif /* CONFIG_XEN_SELFBALLOONING */ static struct sysdev_attribute *balloon_attrs[] = { &attr_target_kb, @@ -197,7 +375,18 @@ static struct sysdev_attribute *balloon_attrs[] = { &attr_schedule_delay.attr, &attr_max_schedule_delay.attr, &attr_retry_count.attr, - &attr_max_retry_count.attr + &attr_max_retry_count.attr, +#ifdef CONFIG_XEN_SELFBALLOONING + &attr_selfballooning, + &attr_selfballoon_interval, + &attr_selfballoon_downhysteresis, + &attr_selfballoon_uphysteresis, +#ifdef CONFIG_FRONTSWAP + &attr_frontswap_selfshrinking, + &attr_frontswap_hysteresis, + &attr_frontswap_inertia, +#endif +#endif }; static struct attribute *balloon_info_attrs[] = { diff --git a/include/xen/balloon.h b/include/xen/balloon.h index a2b22f0..aa36b82 100644 --- a/include/xen/balloon.h +++ b/include/xen/balloon.h @@ -15,6 +15,17 @@ struct balloon_stats { unsigned long max_schedule_delay; unsigned long retry_count; unsigned long max_retry_count; +#ifdef CONFIG_XEN_SELFBALLOONING + int selfballooning_enabled; + unsigned int selfballoon_interval; + unsigned int selfballoon_downhysteresis; + unsigned int selfballoon_uphysteresis; +#ifdef CONFIG_FRONTSWAP + unsigned int frontswap_selfshrinking; + unsigned int frontswap_hysteresis; + unsigned int frontswap_inertia; +#endif +#endif }; extern struct balloon_stats balloon_stats; diff --git a/mm/mmap.c b/mm/mmap.c index 772140c..030a47b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -89,6 +89,13 @@ int sysctl_overcommit_ratio = 50; /* default is 50% */ int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT; struct percpu_counter vm_committed_as; +unsigned long vm_get_committed_as(void) +{ + return percpu_counter_read_positive(&vm_committed_as); + +} +EXPORT_SYMBOL(vm_get_committed_as); + /* * Check that a process has enough memory to allocate a new virtual * mapping. 0 means there is enough memory for the allocation to _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jun-07 09:34 UTC
Re: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
On Mon, 2011-06-06 at 23:12 +0100, Dan Magenheimer wrote:> Hi Konrad (and anyone else interested in reviewing) -- > > Here is the patch for self-ballooning and frontswap-selfshrinking on > top of 2.6.39, and on top of the frontswap commit-set you recently > merged into your tree. > > Since this is the first time I''ve ported this code to a kernel > later than 2.6.34(?), and since this code hasn''t gotten formal > review even though it''s been floating about in various forms > for 18 months, I thought I would post it publicly for review, > rather than just ask for a pull. (I''ll put it in a git tree after > a round or two of feedback.) > > This code complements the cleancache and frontswap patchsets to > optimize support for Xen Transcendent Memory. The policy it > implements is rudimentary, so will almost certainly need to > improve over time. > > (Sorry if this line-wraps... attached also...) > > Thanks, > Dan > > P.S. Anybody who actually uses this patch is encouraged to read > http://oss.oracle.com/projects/tmem/dist/documentation/internals/linuxpatch > from which the following brief documentation was extracted, > which will probably be the git commit comment. > > > > [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking > > Selfballooning creates memory pressure by using the Xen balloon driver > to decrease and increase available kernel memory, tracking a target value > of "Committed_AS" (see /proc/meminfo). Sysfs tunables are provided to > adjust the frequency at which memory size is adjusted, the rate at > which ballooning is used to approach the target, and to disable > selfballooning completely. As of 100827, selfballooning runs in a > separate kernel thread, "selfballooning". > > Frontswap shrinking accounts for the fact that pages swapped to disk > may sit on disk for a very long time, even if very rarely used or if > the kernel knows they will never be used again. This is because > the kernel need not reclaim disk space because the disk space costs > very little and can be overwritten when necessary. When the same > pages are in frontswap, however, they are taking up valuable RAM > real estate. So when frontswap activity is otherwise stable and the > guest kernel is not under memory pressure, the frontswap shrinker > removes some pages from frontswap and returns them to kernel memory. > Sysfs tunables are provided to adjust the frequency of shrinking > opportunities and the shrinking rate, and to create more "inertia". > > Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>checkpatch.pl says: total: 8 errors, 14 warnings, 395 lines checked most of them look valid to me. (I commented on a few below before it became clear you hadn''t run it yourself)> drivers/xen/Kconfig | 10 +++ > drivers/xen/balloon.c | 109 ++++++++++++++++++++++++++ > drivers/xen/tmem.c | 1 + > drivers/xen/xen-balloon.c | 191 ++++++++++++++++++++++++++++++++++++++++++++- > include/xen/balloon.h | 11 +++ > mm/mmap.c | 7 ++ > 6 files changed, 328 insertions(+), 1 deletions(-) > > ==> > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig > index d589fe7..8f21fad 100644 > --- a/drivers/xen/Kconfig > +++ b/drivers/xen/Kconfig > @@ -9,6 +9,16 @@ config XEN_BALLOON > the system to expand the domain''s memory allocation, or alternatively > return unneeded memory to the system. > > +config XEN_SELFBALLOONING > + bool "dynamically self-balloon kernel memory to target" > + depends on XEN && XEN_BALLOON && CLEANCACHE > + default y > + help > + Self-ballooning dynamically balloons available kernel memory driven > + by the current usage of anonymous memory ("committed AS") and > + controlled by various sysfs-settable parameters. May be overridden > + by the noselfballooning kernel boot parameter > + > config XEN_SCRUB_PAGES > bool "Scrub pages before returning them to system" > depends on XEN_BALLOON > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index f54290b..a35b056 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -56,6 +56,7 @@ > #include <xen/balloon.h> > #include <xen/features.h> > #include <xen/page.h> > +#include <linux/frontswap.h> > > /* > * balloon_process() state: > @@ -74,6 +75,14 @@ enum bp_state { > > static DEFINE_MUTEX(balloon_mutex); > > +#ifdef CONFIG_XEN_SELFBALLOONING > +extern int tmem_enabled;No externs in .c files please.> +static int use_selfballooning __read_mostly = 1; > +#ifdef CONFIG_FRONTSWAP > +static int use_frontswap_selfshrink __read_mostly = 1; > +#endif > +#endif > + > struct balloon_stats balloon_stats; > EXPORT_SYMBOL_GPL(balloon_stats); > > @@ -94,6 +103,10 @@ static LIST_HEAD(ballooned_pages); > /* Main work function, always executed in process context. */ > static void balloon_process(struct work_struct *work); > static DECLARE_DELAYED_WORK(balloon_worker, balloon_process); > +#ifdef CONFIG_XEN_SELFBALLOONING > +static void selfballoon_process(struct work_struct *work); > +DECLARE_DELAYED_WORK(selfballoon_worker, selfballoon_process); > +#endif > > /* When ballooning out (allocating memory to return to Xen) we don''t really > want the kernel to try too hard since that can trigger the oom killer. */ > @@ -428,6 +441,86 @@ void free_xenballooned_pages(int nr_pages, struct page** pages) > } > EXPORT_SYMBOL(free_xenballooned_pages); > > +#ifdef CONFIG_XEN_SELFBALLOONING > +#ifdef CONFIG_FRONTSWAP > +#define frontswap_selfshrinking_enabled balloon_stats.frontswap_selfshrinking > + > +unsigned long frontswap_inertia_counter = 0;static?> + > +static void frontswap_selfshrink(void) > +{ > + static unsigned long cur_frontswap_pages = 0; > + static unsigned long last_frontswap_pages = 0; > + static unsigned long tgt_frontswap_pages = 0; > + > + if (!balloon_stats.frontswap_selfshrinking) > + return; > + if (!balloon_stats.frontswap_hysteresis) > + return; > + last_frontswap_pages = cur_frontswap_pages; > + cur_frontswap_pages = frontswap_curr_pages(); > + if (!cur_frontswap_pages || (cur_frontswap_pages > last_frontswap_pages)) { > + frontswap_inertia_counter = balloon_stats.frontswap_inertia; > + return; > + } > + if (frontswap_inertia_counter && --frontswap_inertia_counter) > + return; > + //frontswap_inertia_counter = balloon_stats.frontswap_inertia;Please drop this.> + if ( cur_frontswap_pages <= balloon_stats.frontswap_hysteresis) > + tgt_frontswap_pages = 0; > + else tgt_frontswap_pages = cur_frontswap_pages - > + (cur_frontswap_pages / balloon_stats.frontswap_hysteresis); > + frontswap_shrink(tgt_frontswap_pages); > +} > + > +static int __init no_frontswap_selfshrink_setup(char *s) > +{ > + use_frontswap_selfshrink = 0; > + return 1; > +} > + > +__setup("noselfshrink", no_frontswap_selfshrink_setup); > +#else > +#define frontswap_selfshrink() do {} while (0) > +#define frontswap_selfshrinking_enabled 0 > +#endif > + > +static void selfballoon_process(struct work_struct *work) > +{ > + extern unsigned long vm_get_committed_as(void);In a header please.> + unsigned long cur_pages, goal_pages, tgt_pages; > + int reset_timer = 0; > + > + if (balloon_stats.selfballooning_enabled) { > + tgt_pages = cur_pages = totalram_pages; > + goal_pages = vm_get_committed_as(); > + if (cur_pages > goal_pages) > + tgt_pages = cur_pages - > + (cur_pages - goal_pages) / balloon_stats.selfballoon_downhysteresis; > + else if (cur_pages < goal_pages) > + tgt_pages = cur_pages + > + (goal_pages - cur_pages) / balloon_stats.selfballoon_uphysteresis; > + balloon_set_new_target(tgt_pages); > + reset_timer = 1; > + } > + if (frontswap_selfshrinking_enabled) { > + frontswap_selfshrink(); > + reset_timer = 1; > + } > + if (reset_timer) > + schedule_delayed_work(&selfballoon_worker, > + balloon_stats.selfballoon_interval * HZ); > +} > + > +static int __init noselfballooning_setup(char *s) > +{ > + use_selfballooning = 0; > + return 1; > +} > + > +__setup("noselfballooning", noselfballooning_setup); > +#endif > + > static int __init balloon_init(void) > { > unsigned long pfn, extra_pfn_end; > @@ -468,6 +561,22 @@ static int __init balloon_init(void) > __balloon_append(page); > } > > +#ifdef CONFIG_XEN_SELFBALLOONING > + balloon_stats.selfballooning_enabled = tmem_enabled && > + use_selfballooning; > + balloon_stats.selfballoon_interval = 5; > + balloon_stats.selfballoon_downhysteresis = 8; > + balloon_stats.selfballoon_uphysteresis = 1; > +#ifdef CONFIG_FRONTSWAP > + balloon_stats.frontswap_selfshrinking > + use_frontswap_selfshrink && frontswap_enabled; > + balloon_stats.frontswap_hysteresis = 20; > + balloon_stats.frontswap_inertia = 3; > +#endif > + schedule_delayed_work(&selfballoon_worker, > + balloon_stats.selfballoon_interval * HZ); > +#endifballoon_init already has paragraphs initialising balloon_stats -- this should go up with them I think.> + > return 0; > }> diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c > index a4ff225..5e30fc9 100644 > --- a/drivers/xen/xen-balloon.c > +++ b/drivers/xen/xen-balloon.c > @@ -190,6 +190,184 @@ static ssize_t store_target(struct sys_device *dev, > static SYSDEV_ATTR(target, S_IRUGO | S_IWUSR, > show_target, store_target); > > +#ifdef CONFIG_XEN_SELFBALLOONING > +extern struct delayed_work selfballoon_worker; > + > +static ssize_t show_selfballooning(struct sys_device *dev, struct sysdev_attribute *attr, > + char *buf) > +{ > + return sprintf(buf, "%d\n", balloon_stats.selfballooning_enabled); > +}For the show_* you define here you could largely use BALLOON_SHOW to define the function, I think.> static struct sysdev_attribute *balloon_attrs[] = { > &attr_target_kb, > @@ -197,7 +375,18 @@ static struct sysdev_attribute *balloon_attrs[] = { > &attr_schedule_delay.attr, > &attr_max_schedule_delay.attr, > &attr_retry_count.attr, > - &attr_max_retry_count.attr > + &attr_max_retry_count.attr, > +#ifdef CONFIG_XEN_SELFBALLOONING > + &attr_selfballooning, > + &attr_selfballoon_interval, > + &attr_selfballoon_downhysteresis, > + &attr_selfballoon_uphysteresis, > +#ifdef CONFIG_FRONTSWAP > + &attr_frontswap_selfshrinking, > + &attr_frontswap_hysteresis, > + &attr_frontswap_inertia, > +#endif > +#endif > }; > > static struct attribute *balloon_info_attrs[] = { > diff --git a/include/xen/balloon.h b/include/xen/balloon.h > index a2b22f0..aa36b82 100644 > --- a/include/xen/balloon.h > +++ b/include/xen/balloon.h > @@ -15,6 +15,17 @@ struct balloon_stats { > unsigned long max_schedule_delay; > unsigned long retry_count; > unsigned long max_retry_count; > +#ifdef CONFIG_XEN_SELFBALLOONING > + int selfballooning_enabled; > + unsigned int selfballoon_interval; > + unsigned int selfballoon_downhysteresis; > + unsigned int selfballoon_uphysteresis; > +#ifdef CONFIG_FRONTSWAP > + unsigned int frontswap_selfshrinking; > + unsigned int frontswap_hysteresis; > + unsigned int frontswap_inertia; > +#endif > +#endif > }; > > extern struct balloon_stats balloon_stats; > diff --git a/mm/mmap.c b/mm/mmap.c > index 772140c..030a47b 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -89,6 +89,13 @@ int sysctl_overcommit_ratio = 50; /* default is 50% */ > int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT; > struct percpu_counter vm_committed_as; > > +unsigned long vm_get_committed_as(void) > +{ > + return percpu_counter_read_positive(&vm_committed_as); > + > +} > +EXPORT_SYMBOL(vm_get_committed_as); > +This needs to be split out and go upstream via the mm maintainers (with an extern in a header, not a C file). You add a lot of code to {xen-,}balloon.c which is entirely encased in #ifdef CONFIG_THIS_AND_THAT, without very much interaction with existing code in those files. That suggests to me that the code should live in its own file. Some careful consideration needs to go into the split between balloon.c (core kernel functionality, non-modular), xen-balloon.c (user interfaces, potentially modular, not actually at the moment) and $NEWFILE.c. In fact it seems as if all this functionality is a second user of the core functionality exposed by balloon.c which is independent of the existing xen-balloon.c user of that API. Therefore it should all live in a new module next to xen-balloon.c module rather than be intertwined with both xen-balloon.c and balloon.c. Is there any particular reason this (all) needs to be in the kernel at all? Can a userspace daemon, using (possibly new) statistics exported via /sys and /proc plus the existing balloon interfaces not implement much the same thing? One nice advantage of doing that is that a userspace daemon can more easily implement complex (or multiple) algorithms, tune them etc. If there is a good reason for this to be in kernel I think it should be expanded upon in the commit message. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vasiliy G Tolstov
2011-Jun-07 14:37 UTC
Re: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
On Tue, 2011-06-07 at 10:34 +0100, Ian Campbell wrote:> On Mon, 2011-06-06 at 23:12 +0100, Dan Magenheimer wrote: > > Hi Konrad (and anyone else interested in reviewing) -- > > > > Here is the patch for self-ballooning and frontswap-selfshrinking on > > top of 2.6.39, and on top of the frontswap commit-set you recently > > merged into your tree. > > > > Since this is the first time I''ve ported this code to a kernel > > later than 2.6.34(?), and since this code hasn''t gotten formal > > review even though it''s been floating about in various forms > > for 18 months, I thought I would post it publicly for review, > > rather than just ask for a pull. (I''ll put it in a git tree after > > a round or two of feedback.) > > > > This code complements the cleancache and frontswap patchsets to > > optimize support for Xen Transcendent Memory. The policy it > > implements is rudimentary, so will almost certainly need to > > improve over time. > > > > (Sorry if this line-wraps... attached also...) > > > > Thanks, > > Dan > > > > P.S. Anybody who actually uses this patch is encouraged to read > > http://oss.oracle.com/projects/tmem/dist/documentation/internals/linuxpatch > > from which the following brief documentation was extracted, > > which will probably be the git commit comment. > > > > > > > > [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking > > > > Selfballooning creates memory pressure by using the Xen balloon driver > > to decrease and increase available kernel memory, tracking a target value > > of "Committed_AS" (see /proc/meminfo). Sysfs tunables are provided to > > adjust the frequency at which memory size is adjusted, the rate at > > which ballooning is used to approach the target, and to disable > > selfballooning completely. As of 100827, selfballooning runs in a > > separate kernel thread, "selfballooning". > > > > Frontswap shrinking accounts for the fact that pages swapped to disk > > may sit on disk for a very long time, even if very rarely used or if > > the kernel knows they will never be used again. This is because > > the kernel need not reclaim disk space because the disk space costs > > very little and can be overwritten when necessary. When the same > > pages are in frontswap, however, they are taking up valuable RAM > > real estate. So when frontswap activity is otherwise stable and the > > guest kernel is not under memory pressure, the frontswap shrinker > > removes some pages from frontswap and returns them to kernel memory. > > Sysfs tunables are provided to adjust the frequency of shrinking > > opportunities and the shrinking rate, and to create more "inertia". > > > > Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> > > checkpatch.pl says: > > total: 8 errors, 14 warnings, 395 lines checked > > most of them look valid to me. (I commented on a few below before it > became clear you hadn''t run it yourself) > > > drivers/xen/Kconfig | 10 +++ > > drivers/xen/balloon.c | 109 ++++++++++++++++++++++++++ > > drivers/xen/tmem.c | 1 + > > drivers/xen/xen-balloon.c | 191 ++++++++++++++++++++++++++++++++++++++++++++- > > include/xen/balloon.h | 11 +++ > > mm/mmap.c | 7 ++ > > 6 files changed, 328 insertions(+), 1 deletions(-) > > > > ==> > > > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig > > index d589fe7..8f21fad 100644 > > --- a/drivers/xen/Kconfig > > +++ b/drivers/xen/Kconfig > > @@ -9,6 +9,16 @@ config XEN_BALLOON > > the system to expand the domain''s memory allocation, or alternatively > > return unneeded memory to the system. > > > > +config XEN_SELFBALLOONING > > + bool "dynamically self-balloon kernel memory to target" > > + depends on XEN && XEN_BALLOON && CLEANCACHE > > + default y > > + help > > + Self-ballooning dynamically balloons available kernel memory driven > > + by the current usage of anonymous memory ("committed AS") and > > + controlled by various sysfs-settable parameters. May be overridden > > + by the noselfballooning kernel boot parameter > > + > > config XEN_SCRUB_PAGES > > bool "Scrub pages before returning them to system" > > depends on XEN_BALLOON > > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > > index f54290b..a35b056 100644 > > --- a/drivers/xen/balloon.c > > +++ b/drivers/xen/balloon.c > > @@ -56,6 +56,7 @@ > > #include <xen/balloon.h> > > #include <xen/features.h> > > #include <xen/page.h> > > +#include <linux/frontswap.h> > > > > /* > > * balloon_process() state: > > @@ -74,6 +75,14 @@ enum bp_state { > > > > static DEFINE_MUTEX(balloon_mutex); > > > > +#ifdef CONFIG_XEN_SELFBALLOONING > > +extern int tmem_enabled; > > No externs in .c files please. > > > +static int use_selfballooning __read_mostly = 1; > > +#ifdef CONFIG_FRONTSWAP > > +static int use_frontswap_selfshrink __read_mostly = 1; > > +#endif > > +#endif > > + > > struct balloon_stats balloon_stats; > > EXPORT_SYMBOL_GPL(balloon_stats); > > > > @@ -94,6 +103,10 @@ static LIST_HEAD(ballooned_pages); > > /* Main work function, always executed in process context. */ > > static void balloon_process(struct work_struct *work); > > static DECLARE_DELAYED_WORK(balloon_worker, balloon_process); > > +#ifdef CONFIG_XEN_SELFBALLOONING > > +static void selfballoon_process(struct work_struct *work); > > +DECLARE_DELAYED_WORK(selfballoon_worker, selfballoon_process); > > +#endif > > > > /* When ballooning out (allocating memory to return to Xen) we don''t really > > want the kernel to try too hard since that can trigger the oom killer. */ > > @@ -428,6 +441,86 @@ void free_xenballooned_pages(int nr_pages, struct page** pages) > > } > > EXPORT_SYMBOL(free_xenballooned_pages); > > > > +#ifdef CONFIG_XEN_SELFBALLOONING > > +#ifdef CONFIG_FRONTSWAP > > +#define frontswap_selfshrinking_enabled balloon_stats.frontswap_selfshrinking > > + > > +unsigned long frontswap_inertia_counter = 0; > > static? > > > + > > +static void frontswap_selfshrink(void) > > +{ > > + static unsigned long cur_frontswap_pages = 0; > > + static unsigned long last_frontswap_pages = 0; > > + static unsigned long tgt_frontswap_pages = 0; > > + > > + if (!balloon_stats.frontswap_selfshrinking) > > + return; > > + if (!balloon_stats.frontswap_hysteresis) > > + return; > > + last_frontswap_pages = cur_frontswap_pages; > > + cur_frontswap_pages = frontswap_curr_pages(); > > + if (!cur_frontswap_pages || (cur_frontswap_pages > last_frontswap_pages)) { > > + frontswap_inertia_counter = balloon_stats.frontswap_inertia; > > + return; > > + } > > + if (frontswap_inertia_counter && --frontswap_inertia_counter) > > + return; > > + //frontswap_inertia_counter = balloon_stats.frontswap_inertia; > > Please drop this. > > > + if ( cur_frontswap_pages <= balloon_stats.frontswap_hysteresis) > > + tgt_frontswap_pages = 0; > > + else tgt_frontswap_pages = cur_frontswap_pages - > > + (cur_frontswap_pages / balloon_stats.frontswap_hysteresis); > > + frontswap_shrink(tgt_frontswap_pages); > > +} > > + > > +static int __init no_frontswap_selfshrink_setup(char *s) > > +{ > > + use_frontswap_selfshrink = 0; > > + return 1; > > +} > > + > > +__setup("noselfshrink", no_frontswap_selfshrink_setup); > > +#else > > +#define frontswap_selfshrink() do {} while (0) > > +#define frontswap_selfshrinking_enabled 0 > > +#endif > > + > > +static void selfballoon_process(struct work_struct *work) > > +{ > > + extern unsigned long vm_get_committed_as(void); > > In a header please. > > > + unsigned long cur_pages, goal_pages, tgt_pages; > > + int reset_timer = 0; > > + > > + if (balloon_stats.selfballooning_enabled) { > > + tgt_pages = cur_pages = totalram_pages; > > + goal_pages = vm_get_committed_as(); > > + if (cur_pages > goal_pages) > > + tgt_pages = cur_pages - > > + (cur_pages - goal_pages) / balloon_stats.selfballoon_downhysteresis; > > + else if (cur_pages < goal_pages) > > + tgt_pages = cur_pages + > > + (goal_pages - cur_pages) / balloon_stats.selfballoon_uphysteresis; > > + balloon_set_new_target(tgt_pages); > > + reset_timer = 1; > > + } > > + if (frontswap_selfshrinking_enabled) { > > + frontswap_selfshrink(); > > + reset_timer = 1; > > + } > > + if (reset_timer) > > + schedule_delayed_work(&selfballoon_worker, > > + balloon_stats.selfballoon_interval * HZ); > > +} > > + > > +static int __init noselfballooning_setup(char *s) > > +{ > > + use_selfballooning = 0; > > + return 1; > > +} > > + > > +__setup("noselfballooning", noselfballooning_setup); > > +#endif > > + > > static int __init balloon_init(void) > > { > > unsigned long pfn, extra_pfn_end; > > @@ -468,6 +561,22 @@ static int __init balloon_init(void) > > __balloon_append(page); > > } > > > > +#ifdef CONFIG_XEN_SELFBALLOONING > > + balloon_stats.selfballooning_enabled = tmem_enabled && > > + use_selfballooning; > > + balloon_stats.selfballoon_interval = 5; > > + balloon_stats.selfballoon_downhysteresis = 8; > > + balloon_stats.selfballoon_uphysteresis = 1; > > +#ifdef CONFIG_FRONTSWAP > > + balloon_stats.frontswap_selfshrinking > > + use_frontswap_selfshrink && frontswap_enabled; > > + balloon_stats.frontswap_hysteresis = 20; > > + balloon_stats.frontswap_inertia = 3; > > +#endif > > + schedule_delayed_work(&selfballoon_worker, > > + balloon_stats.selfballoon_interval * HZ); > > +#endif > > balloon_init already has paragraphs initialising balloon_stats -- this > should go up with them I think. > > > + > > return 0; > > } > > > diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c > > index a4ff225..5e30fc9 100644 > > --- a/drivers/xen/xen-balloon.c > > +++ b/drivers/xen/xen-balloon.c > > @@ -190,6 +190,184 @@ static ssize_t store_target(struct sys_device *dev, > > static SYSDEV_ATTR(target, S_IRUGO | S_IWUSR, > > show_target, store_target); > > > > +#ifdef CONFIG_XEN_SELFBALLOONING > > +extern struct delayed_work selfballoon_worker; > > + > > +static ssize_t show_selfballooning(struct sys_device *dev, struct sysdev_attribute *attr, > > + char *buf) > > +{ > > + return sprintf(buf, "%d\n", balloon_stats.selfballooning_enabled); > > +} > > For the show_* you define here you could largely use BALLOON_SHOW to > define the function, I think. > > > static struct sysdev_attribute *balloon_attrs[] = { > > &attr_target_kb, > > @@ -197,7 +375,18 @@ static struct sysdev_attribute *balloon_attrs[] = { > > &attr_schedule_delay.attr, > > &attr_max_schedule_delay.attr, > > &attr_retry_count.attr, > > - &attr_max_retry_count.attr > > + &attr_max_retry_count.attr, > > +#ifdef CONFIG_XEN_SELFBALLOONING > > + &attr_selfballooning, > > + &attr_selfballoon_interval, > > + &attr_selfballoon_downhysteresis, > > + &attr_selfballoon_uphysteresis, > > +#ifdef CONFIG_FRONTSWAP > > + &attr_frontswap_selfshrinking, > > + &attr_frontswap_hysteresis, > > + &attr_frontswap_inertia, > > +#endif > > +#endif > > }; > > > > static struct attribute *balloon_info_attrs[] = { > > diff --git a/include/xen/balloon.h b/include/xen/balloon.h > > index a2b22f0..aa36b82 100644 > > --- a/include/xen/balloon.h > > +++ b/include/xen/balloon.h > > @@ -15,6 +15,17 @@ struct balloon_stats { > > unsigned long max_schedule_delay; > > unsigned long retry_count; > > unsigned long max_retry_count; > > +#ifdef CONFIG_XEN_SELFBALLOONING > > + int selfballooning_enabled; > > + unsigned int selfballoon_interval; > > + unsigned int selfballoon_downhysteresis; > > + unsigned int selfballoon_uphysteresis; > > +#ifdef CONFIG_FRONTSWAP > > + unsigned int frontswap_selfshrinking; > > + unsigned int frontswap_hysteresis; > > + unsigned int frontswap_inertia; > > +#endif > > +#endif > > }; > > > > extern struct balloon_stats balloon_stats; > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 772140c..030a47b 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -89,6 +89,13 @@ int sysctl_overcommit_ratio = 50; /* default is 50% */ > > int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT; > > struct percpu_counter vm_committed_as; > > > > +unsigned long vm_get_committed_as(void) > > +{ > > + return percpu_counter_read_positive(&vm_committed_as); > > + > > +} > > +EXPORT_SYMBOL(vm_get_committed_as); > > + > > This needs to be split out and go upstream via the mm maintainers (with > an extern in a header, not a C file). > > You add a lot of code to {xen-,}balloon.c which is entirely encased in > #ifdef CONFIG_THIS_AND_THAT, without very much interaction with existing > code in those files. That suggests to me that the code should live in > its own file. > > Some careful consideration needs to go into the split between balloon.c > (core kernel functionality, non-modular), xen-balloon.c (user > interfaces, potentially modular, not actually at the moment) and > $NEWFILE.c. > > In fact it seems as if all this functionality is a second user of the > core functionality exposed by balloon.c which is independent of the > existing xen-balloon.c user of that API. Therefore it should all live in > a new module next to xen-balloon.c module rather than be intertwined > with both xen-balloon.c and balloon.c. > > Is there any particular reason this (all) needs to be in the kernel at > all? Can a userspace daemon, using (possibly new) statistics exported > via /sys and /proc plus the existing balloon interfaces not implement > much the same thing? One nice advantage of doing that is that a > userspace daemon can more easily implement complex (or multiple) > algorithms, tune them etc. If there is a good reason for this to be in > kernel I think it should be expanded upon in the commit message. > > Ian.I''m agree with Ian. Some time ago i write daemon that balloon up/down memory, most of the time it work''s fine. But with specific software - for example mongodb and java aplications - algorithm needs changing and after some time i go to idea, that userspace can acts as ulatencyd - kernel provide interface to calculate and to up/down memory (in this case kernel need to provide frontswap shrink...) userspace daemon use statistics and interface can apply various memory calcalation patterns to various workloads and software used in server. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2011-Jun-07 16:21 UTC
RE: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]Thanks, Ian, for taking the time for review! Comments below.> > Since this is the first time I''ve ported this code to a kernel > > later than 2.6.34(?), and since this code hasn''t gotten formal > > review even though it''s been floating about in various forms > > for 18 months, I thought I would post it publicly for review, > > rather than just ask for a pull. (I''ll put it in a git tree after > > a round or two of feedback.) > > checkpatch.pl says: > > total: 8 errors, 14 warnings, 395 lines checked > > most of them look valid to me. (I commented on a few below before it > became clear you hadn''t run it yourself)Oops, yes, sorry, I should have done that first :-}> > +unsigned long frontswap_inertia_counter = 0; > > static?Yes, will fix.> > + //frontswap_inertia_counter = balloon_stats.frontswap_inertia; > > Please drop this.Will fix.> > + extern unsigned long vm_get_committed_as(void); > > In a header please.In an ideal world, yes, see below.> > + balloon_stats.frontswap_inertia = 3; > > +#endif > > + schedule_delayed_work(&selfballoon_worker, > > + balloon_stats.selfballoon_interval * HZ); > > +#endif > > balloon_init already has paragraphs initialising balloon_stats -- this > should go up with them I think.OK, will move.> > +static ssize_t show_selfballooning(struct sys_device *dev, struct sysdev_attribute *attr, > > + char *buf) > > +{ > > + return sprintf(buf, "%d\n", balloon_stats.selfballooning_enabled); > > +} > > For the show_* you define here you could largely use BALLOON_SHOW to > define the function, I think.Thanks, will take a look at that. I think when I first wrote this, that wasn''t the case but may work now.> > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -89,6 +89,13 @@ int sysctl_overcommit_ratio = 50; /* default is 50% */ > > int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT; > > struct percpu_counter vm_committed_as; > > > > +unsigned long vm_get_committed_as(void) > > +{ > > + return percpu_counter_read_positive(&vm_committed_as); > > + > > +} > > +EXPORT_SYMBOL(vm_get_committed_as); > > + > > This needs to be split out and go upstream via the mm maintainers (with > an extern in a header, not a C file).Although you''re right, as I am fresh off a 2-1/2 year odyssey of upstreaming cleancache, AND since this is almost certainly Xen specific AND since there will likely be some changes over time which could conceivably make this unnecessary, I would be content with carrying this in a Xen-only tree for the foreseeable future.> You add a lot of code to {xen-,}balloon.c which is entirely encased in > #ifdef CONFIG_THIS_AND_THAT, without very much interaction with existing > code in those files. That suggests to me that the code should live in > its own file. > > Some careful consideration needs to go into the split between balloon.c > (core kernel functionality, non-modular), xen-balloon.c (user > interfaces, potentially modular, not actually at the moment) and > $NEWFILE.c. > > In fact it seems as if all this functionality is a second user of the > core functionality exposed by balloon.c which is independent of the > existing xen-balloon.c user of that API. Therefore it should all live in > a new module next to xen-balloon.c module rather than be intertwined > with both xen-balloon.c and balloon.c.Could be. I think when I first wrote this, it would have required some more things to be externified, so I didn''t bother. Will take a look again. I was wondering why xen-balloon.c and balloon.c are separate to begin with? It forces a global variable balloon_stats which could be static otherwise. Though it might be possible for a kernel to be built with only one of the two to save space, is there any good reason? Or is it just because the file was getting too big? I note that both of them have a balloon_init and both pr_info a (not quite identical) message.> Is there any particular reason this (all) needs to be in the kernel at > all? Can a userspace daemon, using (possibly new) statistics exported > via /sys and /proc plus the existing balloon interfaces not implement > much the same thing? One nice advantage of doing that is that a > userspace daemon can more easily implement complex (or multiple) > algorithms, tune them etc. If there is a good reason for this to be in > kernel I think it should be expanded upon in the commit message.Will answer separately since I see this part of the thread has already been continued. Thanks again! Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jun-07 16:40 UTC
RE: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
On Tue, 2011-06-07 at 17:21 +0100, Dan Magenheimer wrote:> > > Although you''re right, as I am fresh off a 2-1/2 year odyssey of > upstreaming cleancache, AND since this is almost certainly Xen > specific AND since there will likely be some changes over time > which could conceivably make this unnecessary, I would be content > with carrying this in a Xen-only tree for the foreseeable future.As someone who is fresh of a 5+ year odyssey of upstreaming a Xen-only tree into mainline I must strongly object to this approach. We have had a long uphill battle to get ourselves to the state we are in today and the approach you are suggesting is absolutely counter to the philosophy we have built up whilst doing that and accepting this undoes all of the good work up until now. Getting stuff into a Xen tree is not an aim in and of itself but merely a conduit towards upstream. Nothing should be going into the Xen stable trees with the explicit aim of staying there for the foreseeable future. If you want to run a tmem-only tree with stuff you don''t intend to send upstream yet in it then that is your prerogative, please don''t ask the rest of us to carry that burden. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2011-Jun-07 16:40 UTC
RE: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
> > Is there any particular reason this (all) needs to be in the kernel at > > all? Can a userspace daemon, using (possibly new) statistics exported > > via /sys and /proc plus the existing balloon interfaces not implement > > much the same thing? One nice advantage of doing that is that a > > userspace daemon can more easily implement complex (or multiple) > > algorithms, tune them etc. If there is a good reason for this to be in > > kernel I think it should be expanded upon in the commit message. > > > > Ian. > > I''m agree with Ian. Some time ago i write daemon that balloon up/down > memory, most of the time it work''s fine. But with specific software - > for example mongodb and java aplications - algorithm needs changing and > after some time i go to idea, that userspace can acts as ulatencyd - > kernel provide interface to calculate and to up/down memory (in this > case kernel need to provide frontswap shrink...) userspace daemon use > statistics and interface can apply various memory calcalation patterns > to various workloads and software used in server.It certainly *can* be done. I demonstrated that at Xen Summit 2008 and the userland code has been in the Xen tools/misc tree since Fall 2008. I found some time ago that aggressive ballooning with widely varying workload frequently caused OOMs that went away when the selfballooning and frontswap-selfshrinking code was put in the kernel, presumably making it more responsive. The often fatal nature of OOMs make it difficult to debug... it may be possible to change the userspace code, for example, to sample and adjust at a higher frequency... IIRC I just went with doing it in the kernel because it was what worked. The real objective is for the kernel to be less "selfish" with its memory. Ideally, there should be a generic mechanism for the kernel to "surrender" memory that it can live without... and perhaps it plugs into the balloon driver, perhaps hotplug, or perhaps something else. However "memory it can live without" is as vague as (and essentially the complement of) "working set". The use of "committed_as" is really just one estimator of how "lean" the kernel could be and it happened to be exposed in userspace. It''s also very aggressive, probably too aggressive unless tmem is running. I suspect there are better algorithms... and those algorithms may not have all data exposed in userspace. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jun-07 16:46 UTC
RE: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
On Tue, 2011-06-07 at 17:21 +0100, Dan Magenheimer wrote:> > I was wondering why xen-balloon.c and balloon.c are separate to > begin with? It forces a global variable balloon_stats which could > be static otherwise. Though it might be possible for a kernel to be > built with only one of the two to save space, is there any good > reason? > Or is it just because the file was getting too big?balloon.c is the core functionality. xen-balloon.c is the "user interface" parts, i.e. the proc and sys interfaces. The stuff in balloon.c has more users than just the user interface parts. e.g. other drivers want it etc. See 803eb047a28d239809fff1f87274cdaa94e0d8ea for more info which was done for the use of e.g. ca47ceaa2c407bbddd395c1807b616042365bd65. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2011-Jun-07 16:59 UTC
RE: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
> From: Ian Campbell [mailto:Ian.Campbell@eu.citrix.com] > > > > Although you''re right, as I am fresh off a 2-1/2 year odyssey of > > upstreaming cleancache, AND since this is almost certainly Xen > > specific AND since there will likely be some changes over time > > which could conceivably make this unnecessary, I would be content > > with carrying this in a Xen-only tree for the foreseeable future. > > As someone who is fresh of a 5+ year odyssey of upstreaming a Xen-only > tree into mainline I must strongly object to this approach. > > We have had a long uphill battle to get ourselves to the state we are in > today and the approach you are suggesting is absolutely counter to the > philosophy we have built up whilst doing that and accepting this undoes > all of the good work up until now. > > Getting stuff into a Xen tree is not an aim in and of itself but merely > a conduit towards upstream. Nothing should be going into the Xen stable > trees with the explicit aim of staying there for the foreseeable future. > > If you want to run a tmem-only tree with stuff you don''t intend to send > upstream yet in it then that is your prerogative, please don''t ask the > rest of us to carry that burden.Well that''s a bit harsh. Perhaps I should clarify: By "for the foreseeable future", I meant I don''t expect it to be a candidate for 3.1 or probably even 3.2. I didn''t mean to imply that I wasn''t going to try, just that I wasn''t going to try for 3.1. I see Konrad''s tree as kind of a linux-next specific to Xen where Xen-ish functionality/fixes can be exposed "officially" to Xen users before the upstreaming battle is fought. I''ve had many requests for tmem over the last couple of years but haven''t had a good foundation for delivery. Are you saying nothing should go in Konrad''s tree unless it is immediately upstreamable? Related, there''s a really good article* in lwn.net of James Bottomley''s talk about the Android fork (subscribers only as of now) that I think nicely summarizes our frustration with upstreaming battles. Dan * http://lwn.net/Articles/446297/ ... This may not be public for a week or two. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jun-07 19:37 UTC
RE: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
On Tue, 2011-06-07 at 17:59 +0100, Dan Magenheimer wrote:> > From: Ian Campbell [mailto:Ian.Campbell@eu.citrix.com] > > > > > > Although you''re right, as I am fresh off a 2-1/2 year odyssey of > > > upstreaming cleancache, AND since this is almost certainly Xen > > > specific AND since there will likely be some changes over time > > > which could conceivably make this unnecessary, I would be content > > > with carrying this in a Xen-only tree for the foreseeable future. > > > > As someone who is fresh of a 5+ year odyssey of upstreaming a Xen-only > > tree into mainline I must strongly object to this approach. > > > > We have had a long uphill battle to get ourselves to the state we are in > > today and the approach you are suggesting is absolutely counter to the > > philosophy we have built up whilst doing that and accepting this undoes > > all of the good work up until now. > > > > Getting stuff into a Xen tree is not an aim in and of itself but merely > > a conduit towards upstream. Nothing should be going into the Xen stable > > trees with the explicit aim of staying there for the foreseeable future. > > > > If you want to run a tmem-only tree with stuff you don''t intend to send > > upstream yet in it then that is your prerogative, please don''t ask the > > rest of us to carry that burden. > > Well that''s a bit harsh. Perhaps I should clarify: By "for the > foreseeable future", I meant I don''t expect it to be a candidate > for 3.1 or probably even 3.2. I didn''t mean to imply that > I wasn''t going to try, just that I wasn''t going to try for 3.1.I think that''s fine, but your initial statement sounded a lot less positive and had me quite worried. The old attitude of getting something into the Xen tree as an end goal and leaving the rest to someone else (or no one at all) is something which needs to be stomped on hard (I''m talking more generally now, you''ve explained this specific case)> I see Konrad''s tree as kind of a linux-next specific to Xen where > Xen-ish functionality/fixes can be exposed "officially" to Xen users > before the upstreaming battle is fought.linux-next is very specifically about stuff which is ready for the next merge window, i.e. stuff which a maintainer intends to send along to Linus when the merge window opens. It is also explicitly not for works in progress. You have a tree in linux-next so I assume you have seen and read Stephen''s boilerplate intro. I think that exposing Xen users to features which have not been proposed upstream (whether due to fears of "battles" or otherwise) does neither our users nor us any favours in the long run. Once things are exposed to users in an official way they begin to build expectations and requirements and when such features are subsequently taken upstream it is trying to retain these while addressing upstream concerns and requests for change which is one of the most frequent sources of so called battles and other tensions. The specific hunk which spawned this sub-thread (exporting something from mm/mmap.c) is exactly the sort of change which can lead to tension if it is not discussed with the mm maintainers early on. IMHO it is much better to simply work with upstream, with upstreaming in mind from day one and to release early and often to LKML and to encourage Xen contributors (who are really Linux contributors in this context) to behave in that way also.> I''ve had many requests > for tmem over the last couple of years but haven''t had a good > foundation for delivery.Every developer always has the option of setting up a git tree and publishing specific branches or of posting regular patch kits on LKML etc etc. That is how everyone treats their development work without asking for it to be included in a subsystem tree before it is actually ready or has been through review etc.> Are you saying nothing should go in > Konrad''s tree unless it is immediately upstreamable?It is up to Konrad what he includes in his tree. I think the main thing here is to stop thinking in terms of Xen and "upstream" trees as separate things. The Linux trees which Xen maintainers publish are part of the upstream workflow as much as the tip.git or block/net subsystem maintainer''s trees and should be treated as such. There really should be no _concept_ of an official Xen tree other than as part of the flow from contributors to upstream. IOW there would ideally only be the various Xen maintainer''s linux-next branches and nothing more official than that. However if we really feel there is a need for there to be branches (whichever tree they live in) which we (test and) bless as official Xen.org trees then I think we should aim to be as close to upstream as we are able (and these days we are in a very good position to achieve this aim) and that means those branches are effectively equivalent to either a linux-next style branch where patches in that tree are believed to be stable and ready for upstream or a stable/longterm style branch containing backports of already upstreamed patches (in the case of base versions maintained for longer than normal term). So in the case of official branches, yes, I think its contents should, by definition, be ready for upstream. I originally wrote about this need to switch how we as a community thinks about "Linux upstream" last year, see: http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00078.html I don''t think anything much has changed except that we are in an even better position WRT upstream so the arguments for working with upstream first and foremost are even stronger. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Kiper
2011-Jun-09 17:21 UTC
Re: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
On Mon, Jun 06, 2011 at 03:12:28PM -0700, Dan Magenheimer wrote: [...]> +static void selfballoon_process(struct work_struct *work) > +{ > + extern unsigned long vm_get_committed_as(void); > + unsigned long cur_pages, goal_pages, tgt_pages; > + int reset_timer = 0; > + > + if (balloon_stats.selfballooning_enabled) { > + tgt_pages = cur_pages = totalram_pages;tgt_pages = cur_pages = balloon_stats.current_pages;> + goal_pages = vm_get_committed_as();goal_pages = percpu_counter_read_positive(&vm_committed_as) + balloon_stats.current_pages - totalram_pages;> + if (cur_pages > goal_pages) > + tgt_pages = cur_pages - > + (cur_pages - goal_pages) / balloon_stats.selfballoon_downhysteresis; > + else if (cur_pages < goal_pages) > + tgt_pages = cur_pages + > + (goal_pages - cur_pages) / balloon_stats.selfballoon_uphysteresis; > + balloon_set_new_target(tgt_pages); > + reset_timer = 1; > + } > + if (frontswap_selfshrinking_enabled) { > + frontswap_selfshrink(); > + reset_timer = 1; > + } > + if (reset_timer) > + schedule_delayed_work(&selfballoon_worker, > + balloon_stats.selfballoon_interval * HZ); > +}Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2011-Jun-09 21:12 UTC
RE: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
> From: Daniel Kiper [mailto:dkiper@net-space.pl] > Subject: Re: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap- > selfshrinking> goal_pages = percpu_counter_read_positive(&vm_committed_as) +!!! (Ian cc''ed) Thanks for catching this! I had thought vm_committed_as was not exported (and may not have been when I first coded this). Now that it is, the "core kernel" change that was part of the changeset is no longer required, thus removing the dependency on core kernel/mm review! Yay!> On Mon, Jun 06, 2011 at 03:12:28PM -0700, Dan Magenheimer wrote: > > [...] > > > +static void selfballoon_process(struct work_struct *work) > > +{ > > + extern unsigned long vm_get_committed_as(void); > > + unsigned long cur_pages, goal_pages, tgt_pages; > > + int reset_timer = 0; > > + > > + if (balloon_stats.selfballooning_enabled) { > > + tgt_pages = cur_pages = totalram_pages; > > tgt_pages = cur_pages = balloon_stats.current_pages; > > > + goal_pages = vm_get_committed_as(); > > goal_pages = percpu_counter_read_positive(&vm_committed_as) + > balloon_stats.current_pages - totalram_pages; > > > + if (cur_pages > goal_pages) > > + tgt_pages = cur_pages - > > + (cur_pages - goal_pages) / balloon_stats.selfballoon_downhysteresis; > > + else if (cur_pages < goal_pages) > > + tgt_pages = cur_pages + > > + (goal_pages - cur_pages) / balloon_stats.selfballoon_uphysteresis; > > + balloon_set_new_target(tgt_pages); > > + reset_timer = 1; > > + } > > + if (frontswap_selfshrinking_enabled) { > > + frontswap_selfshrink(); > > + reset_timer = 1; > > + } > > + if (reset_timer) > > + schedule_delayed_work(&selfballoon_worker, > > + balloon_stats.selfballoon_interval * HZ); > > +}Thanks for the review Daniel! Do these changes make it compatible with your hotplug work? In your second correction (goal_pages), is it possible that balloon_stats.current_pages is less then totalram_pages? If so, then goal_pages is less than vm_committed_as, which I don''t think is ever intended. Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Kiper
2011-Jun-10 11:53 UTC
Re: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
On Thu, Jun 09, 2011 at 02:12:48PM -0700, Dan Magenheimer wrote:> > From: Daniel Kiper [mailto:dkiper@net-space.pl] > > Subject: Re: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap- > > selfshrinking > > > goal_pages = percpu_counter_read_positive(&vm_committed_as) + > > !!! (Ian cc''ed) > > Thanks for catching this! I had thought vm_committed_as > was not exported (and may not have been when I first > coded this). Now that it is, the "core kernel" change > that was part of the changeset is no longer required, thus > removing the dependency on core kernel/mm review! Yay! > > > On Mon, Jun 06, 2011 at 03:12:28PM -0700, Dan Magenheimer wrote: > > > > [...] > > > > > +static void selfballoon_process(struct work_struct *work) > > > +{ > > > + extern unsigned long vm_get_committed_as(void); > > > + unsigned long cur_pages, goal_pages, tgt_pages; > > > + int reset_timer = 0; > > > + > > > + if (balloon_stats.selfballooning_enabled) { > > > + tgt_pages = cur_pages = totalram_pages; > > > > tgt_pages = cur_pages = balloon_stats.current_pages; > > > > > + goal_pages = vm_get_committed_as(); > > > > goal_pages = percpu_counter_read_positive(&vm_committed_as) + > > balloon_stats.current_pages - totalram_pages; > > > > > + if (cur_pages > goal_pages) > > > + tgt_pages = cur_pages - > > > + (cur_pages - goal_pages) / balloon_stats.selfballoon_downhysteresis; > > > + else if (cur_pages < goal_pages) > > > + tgt_pages = cur_pages + > > > + (goal_pages - cur_pages) / balloon_stats.selfballoon_uphysteresis; > > > + balloon_set_new_target(tgt_pages); > > > + reset_timer = 1; > > > + } > > > + if (frontswap_selfshrinking_enabled) { > > > + frontswap_selfshrink(); > > > + reset_timer = 1; > > > + } > > > + if (reset_timer) > > > + schedule_delayed_work(&selfballoon_worker, > > > + balloon_stats.selfballoon_interval * HZ); > > > +} > > Thanks for the review Daniel! Do these changes make it compatible > with your hotplug work?Eariler versions of memory hotplug worked fine with tmem. I did not test latest one with tmem, however, it should work without significant changes. If you wish I could help you to intergrate tmem with memory hotplug.> In your second correction (goal_pages), is it possible that > balloon_stats.current_pages is less then totalram_pages?balloon_stats.current_pages is always greater than totalram_pages. balloon_stats.current_pages counts all pages currently available to the system. totalram_pages counts only pages which could be allocated for user space processes (more or less). It means that balloon_stats.current_pages - totalram_pages represents all pages allocated by kernel (more or less). Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jun-14 09:15 UTC
Re: [Xen-devel] [PATCH] [linux-2.6.39.x for xen] tmem: self-ballooning and frontswap-selfshrinking
On Thu, 2011-06-09 at 18:21 +0100, Daniel Kiper wrote:> On Mon, Jun 06, 2011 at 03:12:28PM -0700, Dan Magenheimer wrote: > > [...] > > > +static void selfballoon_process(struct work_struct *work) > > +{ > > + extern unsigned long vm_get_committed_as(void); > > + unsigned long cur_pages, goal_pages, tgt_pages; > > + int reset_timer = 0; > > + > > + if (balloon_stats.selfballooning_enabled) { > > + tgt_pages = cur_pages = totalram_pages; > > tgt_pages = cur_pages = balloon_stats.current_pages;Linux coding style prefers to avoid these multiple assignments. It favours e.g. tgt_pages = balloon_stats.current_pages; cur_pages = balloon_stats.current_pages; etc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel