Dan Magenheimer
2011-Sep-27 15:03 UTC
[Xen-devel] [PATCH v2] xen: Fix selfballooning and ensure it doesn''t go too far
Note: This patch is also now in a git tree at: git://oss.oracle.com/git/djm/tmem.git#selfballoon-fix-v2 The balloon driver''s "current_pages" is very different from totalram_pages. Self-ballooning needs to be driven by the latter. Also, Committed_AS doesn''t account for pages used by the kernel so enforce a floor for when there are little or no user-space threads using memory. The floor function includes a "safety margin" tuneable in case we discover later that the floor function is still too aggressive in some workloads, though likely it will not be needed. Changes since version 1: - tuneable safety margin added [v2: konrad.wilk@oracle.com: make safety margin tuneable] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> diff --git a/drivers/xen/xen-selfballoon.c b/drivers/xen/xen-selfballoon.c index 6ea852e..ceaada6 100644 --- a/drivers/xen/xen-selfballoon.c +++ b/drivers/xen/xen-selfballoon.c @@ -93,6 +93,12 @@ static unsigned int selfballoon_uphysteresis __read_mostly = 1; /* In HZ, controls frequency of worker invocation. */ static unsigned int selfballoon_interval __read_mostly = 5; +/* + * Scale factor for safety margin for minimum selfballooning target for balloon. + * Default adds about 3% (4/128) of max_pfn. + */ +static unsigned int selfballoon_safety_margin = 4; + static void selfballoon_process(struct work_struct *work); static DECLARE_DELAYED_WORK(selfballoon_worker, selfballoon_process); @@ -195,14 +201,13 @@ __setup("selfballooning", xen_selfballooning_setup); */ static void selfballoon_process(struct work_struct *work) { - unsigned long cur_pages, goal_pages, tgt_pages; + unsigned long cur_pages, goal_pages, tgt_pages, floor_pages; bool reset_timer = false; if (xen_selfballooning_enabled) { - cur_pages = balloon_stats.current_pages; + cur_pages = totalram_pages; tgt_pages = cur_pages; /* default is no change */ - goal_pages = percpu_counter_read_positive(&vm_committed_as) + - balloon_stats.current_pages - totalram_pages; + goal_pages = percpu_counter_read_positive(&vm_committed_as); #ifdef CONFIG_FRONTSWAP /* allow space for frontswap pages to be repatriated */ if (frontswap_selfshrinking && frontswap_enabled) @@ -217,7 +222,13 @@ static void selfballoon_process(struct work_struct *work) ((goal_pages - cur_pages) / selfballoon_uphysteresis); /* else if cur_pages == goal_pages, no change */ - balloon_set_new_target(tgt_pages); + floor_pages = totalreserve_pages + + ((roundup_pow_of_two(max_pfn) / 128) * + selfballoon_safety_margin); + if (tgt_pages < floor_pages) + tgt_pages = floor_pages; + balloon_set_new_target(tgt_pages + + balloon_stats.current_pages - totalram_pages); reset_timer = true; } #ifdef CONFIG_FRONTSWAP @@ -340,6 +351,30 @@ static ssize_t store_selfballoon_uphys(struct sys_device *dev, static SYSDEV_ATTR(selfballoon_uphysteresis, S_IRUGO | S_IWUSR, show_selfballoon_uphys, store_selfballoon_uphys); +SELFBALLOON_SHOW(selfballoon_safety_margin, "%d\n", selfballoon_safety_margin); + +static ssize_t store_selfballoon_safety_margin(struct sys_device *dev, + struct sysdev_attribute *attr, + const char *buf, + size_t count) +{ + unsigned long val; + int err; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + err = strict_strtoul(buf, 10, &val); + if (err || val == 0 || val > 128) + return -EINVAL; + selfballoon_safety_margin = val; + return count; +} + +static SYSDEV_ATTR(selfballoon_safety_margin, S_IRUGO | S_IWUSR, + show_selfballoon_safety_margin, + store_selfballoon_safety_margin); + + #ifdef CONFIG_FRONTSWAP SELFBALLOON_SHOW(frontswap_selfshrinking, "%d\n", frontswap_selfshrinking); @@ -421,6 +456,7 @@ static struct attribute *selfballoon_attrs[] = { &attr_selfballoon_interval.attr, &attr_selfballoon_downhysteresis.attr, &attr_selfballoon_uphysteresis.attr, + &attr_selfballoon_safety_margin.attr, #ifdef CONFIG_FRONTSWAP &attr_frontswap_selfshrinking.attr, &attr_frontswap_hysteresis.attr, _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
David Vrabel
2011-Sep-27 15:57 UTC
[Xen-devel] Re: [PATCH v2] xen: Fix selfballooning and ensure it doesn''t go too far
On 27/09/11 16:03, Dan Magenheimer wrote:> Note: This patch is also now in a git tree at: > > git://oss.oracle.com/git/djm/tmem.git#selfballoon-fix-v2 > > The balloon driver''s "current_pages" is very different from > totalram_pages. Self-ballooning needs to be driven by > the latter.I don''t think this part of the change makes any difference. It looks like it rearranges the maths without changing the end result (other than slightly increasing the rate of change). I think this (partial, untested) patch is equivalent: diff --git a/drivers/xen/xen-selfballoon.c b/drivers/xen/xen-selfballoon.c index 1b4afd8..b207e89 100644 --- a/drivers/xen/xen-selfballoon.c +++ b/drivers/xen/xen-selfballoon.c @@ -194,14 +194,15 @@ __setup("selfballooning", xen_selfballooning_setup); */ static void selfballoon_process(struct work_struct *work) { - unsigned long cur_pages, goal_pages, tgt_pages; + unsigned long cur_pages, goal_pages, tgt_pages, rsvd_pages, floor_pages; bool reset_timer = false; if (xen_selfballooning_enabled) { cur_pages = balloon_stats.current_pages; tgt_pages = cur_pages; /* default is no change */ - goal_pages = percpu_counter_read_positive(&vm_committed_as) + - balloon_stats.current_pages - totalram_pages; + rsvd_pages = cur_pages - totalram_pages; + goal_pages = percpu_counter_read_positive(&vm_committed_as) + + rsvd_pages; #ifdef CONFIG_FRONTSWAP /* allow space for frontswap pages to be repatriated */ if (frontswap_selfshrinking && frontswap_enabled) @@ -216,6 +217,11 @@ static void selfballoon_process(struct work_struct *work) ((goal_pages - cur_pages) / selfballoon_uphysteresis); /* else if cur_pages == goal_pages, no change */ + floor_pages = rsvd_pages + totalreserve_pages + + ((roundup_pow_of_two(max_pfn) / 128) * + selfballoon_safety_margin); + if (tgt_pages < floor_pages) + tgt_pages = floor_pages; balloon_set_new_target(tgt_pages); reset_timer = true; }> Also, Committed_AS doesn''t account for pages > used by the kernel so enforce a floor for when there are little > or no user-space threads using memory. The floor function > includes a "safety margin" tuneable in case we discover later > that the floor function is still too aggressive in some workloads, > though likely it will not be needed.The sysfs file isn''t documented (but then neither are any of the other (self-)balloon driver sysfs files). I don''t think "safety_margin" is the right name. Perhaps, "min_reservation_ratio" or something like that? David> Changes since version 1: > - tuneable safety margin added > > [v2: konrad.wilk@oracle.com: make safety margin tuneable] > > Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> > > diff --git a/drivers/xen/xen-selfballoon.c b/drivers/xen/xen-selfballoon.c > index 6ea852e..ceaada6 100644 > --- a/drivers/xen/xen-selfballoon.c > +++ b/drivers/xen/xen-selfballoon.c > @@ -93,6 +93,12 @@ static unsigned int selfballoon_uphysteresis __read_mostly = 1; > /* In HZ, controls frequency of worker invocation. */ > static unsigned int selfballoon_interval __read_mostly = 5; > > +/* > + * Scale factor for safety margin for minimum selfballooning target for balloon. > + * Default adds about 3% (4/128) of max_pfn. > + */ > +static unsigned int selfballoon_safety_margin = 4; > + > static void selfballoon_process(struct work_struct *work); > static DECLARE_DELAYED_WORK(selfballoon_worker, selfballoon_process); > > @@ -195,14 +201,13 @@ __setup("selfballooning", xen_selfballooning_setup); > */ > static void selfballoon_process(struct work_struct *work) > { > - unsigned long cur_pages, goal_pages, tgt_pages; > + unsigned long cur_pages, goal_pages, tgt_pages, floor_pages; > bool reset_timer = false; > > if (xen_selfballooning_enabled) { > - cur_pages = balloon_stats.current_pages; > + cur_pages = totalram_pages; > tgt_pages = cur_pages; /* default is no change */ > - goal_pages = percpu_counter_read_positive(&vm_committed_as) + > - balloon_stats.current_pages - totalram_pages; > + goal_pages = percpu_counter_read_positive(&vm_committed_as); > #ifdef CONFIG_FRONTSWAP > /* allow space for frontswap pages to be repatriated */ > if (frontswap_selfshrinking && frontswap_enabled) > @@ -217,7 +222,13 @@ static void selfballoon_process(struct work_struct *work) > ((goal_pages - cur_pages) / > selfballoon_uphysteresis); > /* else if cur_pages == goal_pages, no change */ > - balloon_set_new_target(tgt_pages); > + floor_pages = totalreserve_pages + > + ((roundup_pow_of_two(max_pfn) / 128) * > + selfballoon_safety_margin); > + if (tgt_pages < floor_pages) > + tgt_pages = floor_pages; > + balloon_set_new_target(tgt_pages + > + balloon_stats.current_pages - totalram_pages); > reset_timer = true; > } > #ifdef CONFIG_FRONTSWAP > @@ -340,6 +351,30 @@ static ssize_t store_selfballoon_uphys(struct sys_device *dev, > static SYSDEV_ATTR(selfballoon_uphysteresis, S_IRUGO | S_IWUSR, > show_selfballoon_uphys, store_selfballoon_uphys); > > +SELFBALLOON_SHOW(selfballoon_safety_margin, "%d\n", selfballoon_safety_margin); > + > +static ssize_t store_selfballoon_safety_margin(struct sys_device *dev, > + struct sysdev_attribute *attr, > + const char *buf, > + size_t count) > +{ > + unsigned long val; > + int err; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + err = strict_strtoul(buf, 10, &val); > + if (err || val == 0 || val > 128) > + return -EINVAL; > + selfballoon_safety_margin = val; > + return count; > +} > + > +static SYSDEV_ATTR(selfballoon_safety_margin, S_IRUGO | S_IWUSR, > + show_selfballoon_safety_margin, > + store_selfballoon_safety_margin); > + > + > #ifdef CONFIG_FRONTSWAP > SELFBALLOON_SHOW(frontswap_selfshrinking, "%d\n", frontswap_selfshrinking); > > @@ -421,6 +456,7 @@ static struct attribute *selfballoon_attrs[] = { > &attr_selfballoon_interval.attr, > &attr_selfballoon_downhysteresis.attr, > &attr_selfballoon_uphysteresis.attr, > + &attr_selfballoon_safety_margin.attr, > #ifdef CONFIG_FRONTSWAP > &attr_frontswap_selfshrinking.attr, > &attr_frontswap_hysteresis.attr,_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2011-Sep-27 16:19 UTC
[Xen-devel] RE: [PATCH v2] xen: Fix selfballooning and ensure it doesn''t go too far
> From: David Vrabel [mailto:david.vrabel@citrix.com] > Subject: Re: [PATCH v2] xen: Fix selfballooning and ensure it doesn''t go too far > > On 27/09/11 16:03, Dan Magenheimer wrote: > > Note: This patch is also now in a git tree at: > > > > git://oss.oracle.com/git/djm/tmem.git#selfballoon-fix-v2 > > > > The balloon driver''s "current_pages" is very different from > > totalram_pages. Self-ballooning needs to be driven by > > the latter.Hi David -- Thanks for the feedback!> I don''t think this part of the change makes any difference. It looks like it > rearranges the maths without changing the end result (other than > slightly increasing the rate of change). > I think this (partial, untested) patch is equivalent:Actually it does. The key difference is the parameter to the call to balloon_set_new_target. The math in my patch is done in "internal" math (e.g. kernel-relevant variables) and the math in your patch is done in "external" math (e.g. Xen-relevant variables). Balloon_set_new_target requires "external" math, so I convert at the point of call.> The sysfs file isn''t documented (but then neither are any of the other > (self-)balloon driver sysfs files).Yep. This is a bug fix, so I''m not trying to fix all the sins of others (and myself). Since you are familiar with the meaning of all the core balloon driver variables exposed through sysfs, perhaps you might submit a patch to document them and/or suggest which ones should be in debugfs instead?> I don''t think "safety_margin" is the right name. Perhaps, > "min_reservation_ratio" or something like that?Yeah, I struggled with the name because the concept that the variable implements is pretty complex. I finally decided on safety_margin because I think it will draw the attention of a user who has reason to look for it. I don''t expect that it will be used anyway, but it is there in case I am wrong. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
David Vrabel
2011-Sep-27 17:08 UTC
[Xen-devel] Re: [PATCH v2] xen: Fix selfballooning and ensure it doesn''t go too far
On 27/09/11 17:19, Dan Magenheimer wrote:>> From: David Vrabel [mailto:david.vrabel@citrix.com] >> Subject: Re: [PATCH v2] xen: Fix selfballooning and ensure it doesn''t go too far >> >> On 27/09/11 16:03, Dan Magenheimer wrote: >>> Note: This patch is also now in a git tree at: >>> >>> git://oss.oracle.com/git/djm/tmem.git#selfballoon-fix-v2 >>> >>> The balloon driver''s "current_pages" is very different from >>> totalram_pages. Self-ballooning needs to be driven by >>> the latter. > > Hi David -- > > Thanks for the feedback! > >> I don''t think this part of the change makes any difference. It looks like it >> rearranges the maths without changing the end result (other than >> slightly increasing the rate of change). >> I think this (partial, untested) patch is equivalent: > > Actually it does.Really? Both patched and unpatched the new target, S, is (eventually): S = V + F + C - T where V is vm_committed_as, F is frontswap_curr_pages(), C is balloon_stats.current_pages, and T = totalram_pages. Perhaps the refactoring of the maths is a good idea (I don''t think so) but it shouldn''t be part of this patch and it shouldn''t be described as a fix. David _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2011-Sep-27 20:25 UTC
[Xen-devel] RE: [PATCH v2] xen: Fix selfballooning and ensure it doesn''t go too far
> From: David Vrabel [mailto:david.vrabel@citrix.com] > Cc: Konrad Wilk; linux-kernel@vger.kernel.org; xen-devel@lists.xensource.com; Jeremy Fitzhardinge > Subject: Re: [PATCH v2] xen: Fix selfballooning and ensure it doesn''t go too far > > On 27/09/11 17:19, Dan Magenheimer wrote: > >> From: David Vrabel [mailto:david.vrabel@citrix.com] > >> Subject: Re: [PATCH v2] xen: Fix selfballooning and ensure it doesn''t go too far > >> > >> On 27/09/11 16:03, Dan Magenheimer wrote: > >>> Note: This patch is also now in a git tree at: > >>> > >>> git://oss.oracle.com/git/djm/tmem.git#selfballoon-fix-v2 > >>> > >>> The balloon driver''s "current_pages" is very different from > >>> totalram_pages. Self-ballooning needs to be driven by > >>> the latter. > > > > Hi David -- > > > > Thanks for the feedback! > > > >> I don''t think this part of the change makes any difference. It looks like it > >> rearranges the maths without changing the end result (other than > >> slightly increasing the rate of change). > >> I think this (partial, untested) patch is equivalent: > > > > Actually it does. > > Really? > > Both patched and unpatched the new target, S, is (eventually): > > S = V + F + C - T > > where V is vm_committed_as, F is frontswap_curr_pages(), C is > balloon_stats.current_pages, and T = totalram_pages.Sorry, in my haste to shoot off a quick reply while my mind was somewhere else, I see my reply was poor and misleading. Yes, "S", the value passed to balloon_set_new_target(), is the same in most cases. However, it is V+F, not S, that must be compared against M (= the floor function); the "target" of selfballooning, the value that the kernel cares about (not the value that Xen cares about) is max(V+F,M). This gets converted to Xen-cares-about, IOW: S = max(V+F,M) + C - T where S is passed to balloon_set_new_target.> Perhaps the refactoring of the maths is a good idea (I don''t think so) > but it shouldn''t be part of this patch and it shouldn''t be described as > a fix.The refactored version makes sense now from a kernel perspective, though I can see how it might be confusing from a Xen perspective, especially to a balloon driver expert such as yourself. It is most definitely a fix because the formula is different and OOMs that previously happened no longer happen. I don''t think the commit comment describes the *refactoring* as a fix, just says that self-ballooning needs to be driven by kernel- cares-about values (even if it has to interface to the Xen balloon driver with a Xen-cares-about parameter). Hopefully that makes more sense? Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel