thr3ads.net - Xen devel - Self-ballooning question / cache issue [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Jana Saout

2012-Apr-29 19:34 UTC

Self-ballooning question / cache issue

Hello,

I have been testing autoballooning on a production Xen system today
(with cleancache + frontswap on Xen-provided tmem).  For most of the
idle or CPU-centric VMs it seems to work just fine.

However, on one of the web-serving VMs, there is also a cron job running
every few minutes which runs over a rather large directory (plus, this
directory is on OCFS2 so this is a rather time-consuming process).  Now,
if the dcache/inode cache is large enough (which it was before, since
the VM got allocated 4 GB and is only using 1-2 most of the time), this
was not a problem.

Now, with self-ballooning, the memory gets reduced to somewhat between 1
and 2 GB and after a few minutes the load is going through the ceiling.
Jobs reading through said directories are piling up (stuck in D state,
waiting for the FS).  And most of the time kswapd is spinning at 100%.
If I deactivate self-ballooning and assign the VM 3 GB, everything goes
back to normal after a few minutes. (and, "ls -l" on said directory is
served from the cache again).

Now, I am aware that said problem is a self-made one.  The directory was
not actually supposed to contain that many files and the next job not
waiting for the previous job to terminate is cause for trouble - but
still, I would consider this a possible regression since it seems
self-ballooning is constantly thrashing the VM''s caches.  Not all
caches
can be saved in cleancache.

What about an additional tunable: a user-specified amount of pages that
is added on top of the computed target number of pages?  This way, one
could manually reserve a bit more room for other types of caches. (in
fact, I might try this myself, since it shouldn''t be too hard to do so)

Any opinions on this?

Thank you,

	Jana

Dan Magenheimer

2012-May-01 16:52 UTC

head link

Re: Self-ballooning question / cache issue

> From: Jana Saout [mailto:jana@saout.de]
> Subject: [Xen-devel] Self-ballooning question / cache issue
> 
> Hello,
> 
> I have been testing autoballooning on a production Xen system today
> (with cleancache + frontswap on Xen-provided tmem).  For most of the
> idle or CPU-centric VMs it seems to work just fine.
> 
> However, on one of the web-serving VMs, there is also a cron job running
> every few minutes which runs over a rather large directory (plus, this
> directory is on OCFS2 so this is a rather time-consuming process).  Now,
> if the dcache/inode cache is large enough (which it was before, since
> the VM got allocated 4 GB and is only using 1-2 most of the time), this
> was not a problem.
> 
> Now, with self-ballooning, the memory gets reduced to somewhat between 1
> and 2 GB and after a few minutes the load is going through the ceiling.
> Jobs reading through said directories are piling up (stuck in D state,
> waiting for the FS).  And most of the time kswapd is spinning at 100%.
> If I deactivate self-ballooning and assign the VM 3 GB, everything goes
> back to normal after a few minutes. (and, "ls -l" on said
directory is
> served from the cache again).
> 
> Now, I am aware that said problem is a self-made one.  The directory was
> not actually supposed to contain that many files and the next job not
> waiting for the previous job to terminate is cause for trouble - but
> still, I would consider this a possible regression since it seems
> self-ballooning is constantly thrashing the VM''s caches.  Not all
caches
> can be saved in cleancache.
> 
> What about an additional tunable: a user-specified amount of pages that
> is added on top of the computed target number of pages?  This way, one
> could manually reserve a bit more room for other types of caches. (in
> fact, I might try this myself, since it shouldn''t be too hard to
do so)
> 
> Any opinions on this?
Hi Jana --

Thanks for doing this analysis.  While your workload is a bit
unusual, I agree that you have exposed a problem that will need
to be resolved.  It was observed three years ago that the next
"frontend" for tmem could handle a cleancache-like mechanism
for the dcache.  Until now, I had thought that this was purely
optional and would yield only a small performance improvement.
But with your workload, I think the combination of the facts that
selfballooning is forcing out dcache entries and they aren''t
being saved in tmem is resulting in the problem you are seeing.

I think the best solution for this will be a "cleandcache"
patch in the Linux guest... but given how long it has taken
to get cleancache and frontswap into the kernel (and the fact
that a working cleandcache patch doesn''t even exist yet), I
wouldn''t hold my breath ;-)  I will put it on the "to do"
list though.

Your idea of the tunable is interesting (and patches are always
welcome!) but I am skeptical that it will solve the problem
since I would guess the Linux kernel is shrinking dcache
proportional to the size of the page cache.  So adding more
RAM with your "user-specified amount of pages that is
added on top of the computed target number of pages",
the RAM will still be shared across all caches and only
some small portion of the added RAM will likely be used
for dcache.

However, if you have a chance to try it, I would be interested
in your findings.  Note that you already can set a
permanent floor for selfballooning ("min_usable_mb") or,
of course, just turn off selfballooning altogether.

Thanks,
Dan

Jana Saout

2012-May-02 10:13 UTC

head link

Re: Self-ballooning question / cache issue

Hi Dan,
> > I have been testing autoballooning on a production Xen system today
> > (with cleancache + frontswap on Xen-provided tmem).  For most of the
> > idle or CPU-centric VMs it seems to work just fine.
> > 
> > However, on one of the web-serving VMs, there is also a cron job
running
> > every few minutes which runs over a rather large directory (plus, this
> > directory is on OCFS2 so this is a rather time-consuming process). 
Now,
> > if the dcache/inode cache is large enough (which it was before, since
> > the VM got allocated 4 GB and is only using 1-2 most of the time),
this
> > was not a problem.
> > 
> > Now, with self-ballooning, the memory gets reduced to somewhat between
1
> > and 2 GB and after a few minutes the load is going through the
ceiling.
> > Jobs reading through said directories are piling up (stuck in D state,
> > waiting for the FS).  And most of the time kswapd is spinning at 100%.
> > If I deactivate self-ballooning and assign the VM 3 GB, everything
goes
> > back to normal after a few minutes. (and, "ls -l" on said
directory is
> > served from the cache again).
> > 
> > Now, I am aware that said problem is a self-made one.  The directory
was
> > not actually supposed to contain that many files and the next job not
> > waiting for the previous job to terminate is cause for trouble - but
> > still, I would consider this a possible regression since it seems
> > self-ballooning is constantly thrashing the VM''s caches.  Not
all caches
> > can be saved in cleancache.
> > 
> > What about an additional tunable: a user-specified amount of pages
that
> > is added on top of the computed target number of pages?  This way, one
> > could manually reserve a bit more room for other types of caches. (in
> > fact, I might try this myself, since it shouldn''t be too hard
to do so)
> > 
> > Any opinions on this?
> 
> Thanks for doing this analysis.  While your workload is a bit
> unusual, I agree that you have exposed a problem that will need
> to be resolved.  It was observed three years ago that the next
> "frontend" for tmem could handle a cleancache-like mechanism
> for the dcache.  Until now, I had thought that this was purely
> optional and would yield only a small performance improvement.
> But with your workload, I think the combination of the facts that
> selfballooning is forcing out dcache entries and they aren''t
> being saved in tmem is resulting in the problem you are seeing.
Yes.  In fact, I''ve been rolling out selfballooning across a
development
system and most VMs were just fine with the default.  The overall memory
savings from going from a static to a dynamic memory allocation is quite
significant - without the VMs having to resort to actual to-disk-paging
when there is a sudden increase in memory usage.  Quite nice.

Just for information: The filesystem which this machine was using is
OCFS2 (shared across 5 VMs) and the directory contains 45k files
(*cough* - I''m aware that''s not optimal, I''m
currently talking to the
dev of that application to not scan the entire list of files every
minute) - which takes a few minutes (especially stat''ing every file).

I have been observing, that kswapd seems rather busy at times on some
VMs, even when there is no actual swapping taking place. (or, could it
be frontswap or just page reclaim?) This can be migitated by increasing
the memory reserve a bit using my trivial test patch (see below).
> I think the best solution for this will be a "cleandcache"
> patch in the Linux guest... but given how long it has taken
> to get cleancache and frontswap into the kernel (and the fact
> that a working cleandcache patch doesn''t even exist yet), I
> wouldn''t hold my breath ;-)  I will put it on the "to
do"
> list though.
That sounds nice!
> Your idea of the tunable is interesting (and patches are always
> welcome!) but I am skeptical that it will solve the problem
> since I would guess the Linux kernel is shrinking dcache
> proportional to the size of the page cache.  So adding more
> RAM with your "user-specified amount of pages that is
> added on top of the computed target number of pages",
> the RAM will still be shared across all caches and only
> some small portion of the added RAM will likely be used
> for dcache.
That''s true.  In fact, I have to add about 1 GB of memory in order to
keep the relevant dcache / inode cache entries to stay in the cache.
When I do that the largest portion of memory is still eaten up by the
regular page cache.  So this is more of a workaround than a solution,
but for now it works.

I''ve attached the simple patch I''ve whipped up below.
> However, if you have a chance to try it, I would be interested
> in your findings.  Note that you already can set a
> permanent floor for selfballooning ("min_usable_mb") or,
> of course, just turn off selfballooning altogether.
Sure, that''s always a possibility.  However, the VM already had an
overly large amount of memory before to avoid the problem.  Now it runs
with less memory (still a bit more than required), and when a load spike
comes, it can quickly balloon up, which is exactly what I was looking
for.

	Jana

----
Author: Jana Saout <jana@saout.de>
Date:   Sun Apr 29 22:09:29 2012 +0200

    Add selfballoning memory reservation tunable.

diff --git a/drivers/xen/xen-selfballoon.c b/drivers/xen/xen-selfballoon.c
index 146c948..7d041cb 100644
--- a/drivers/xen/xen-selfballoon.c
+++ b/drivers/xen/xen-selfballoon.c
@@ -105,6 +105,12 @@ static unsigned int selfballoon_interval __read_mostly = 5;
  */
 static unsigned int selfballoon_min_usable_mb;
 
+/*
+ * Amount of RAM in MB to add to the target number of pages.
+ * Can be used to reserve some more room for caches and the like.
+ */
+static unsigned int selfballoon_reserved_mb;
+
 static void selfballoon_process(struct work_struct *work);
 static DECLARE_DELAYED_WORK(selfballoon_worker, selfballoon_process);
 
@@ -217,7 +223,8 @@ static void selfballoon_process(struct work_struct *work)
 		cur_pages = totalram_pages;
 		tgt_pages = cur_pages; /* default is no change */
 		goal_pages = percpu_counter_read_positive(&vm_committed_as) +
-				totalreserve_pages;
+				totalreserve_pages +
+				MB2PAGES(selfballoon_reserved_mb);
 #ifdef CONFIG_FRONTSWAP
 		/* allow space for frontswap pages to be repatriated */
 		if (frontswap_selfshrinking && frontswap_enabled)
@@ -397,6 +404,30 @@ static DEVICE_ATTR(selfballoon_min_usable_mb, S_IRUGO |
S_IWUSR,
 		   show_selfballoon_min_usable_mb,
 		   store_selfballoon_min_usable_mb);
 
+SELFBALLOON_SHOW(selfballoon_reserved_mb, "%d\n",
+				selfballoon_reserved_mb);
+
+static ssize_t store_selfballoon_reserved_mb(struct device *dev,
+					     struct device_attribute *attr,
+					     const char *buf,
+					     size_t count)
+{
+	unsigned long val;
+	int err;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+	err = strict_strtoul(buf, 10, &val);
+	if (err)
+		return -EINVAL;
+	selfballoon_reserved_mb = val;
+	return count;
+}
+
+static DEVICE_ATTR(selfballoon_reserved_mb, S_IRUGO | S_IWUSR,
+		   show_selfballoon_reserved_mb,
+		   store_selfballoon_reserved_mb);
+
 
 #ifdef CONFIG_FRONTSWAP
 SELFBALLOON_SHOW(frontswap_selfshrinking, "%d\n",
frontswap_selfshrinking);
@@ -480,6 +511,7 @@ static struct attribute *selfballoon_attrs[] = {
 	&dev_attr_selfballoon_downhysteresis.attr,
 	&dev_attr_selfballoon_uphysteresis.attr,
 	&dev_attr_selfballoon_min_usable_mb.attr,
+	&dev_attr_selfballoon_reserved_mb.attr,
 #ifdef CONFIG_FRONTSWAP
 	&dev_attr_frontswap_selfshrinking.attr,
 	&dev_attr_frontswap_hysteresis.attr,

Dan Magenheimer

2012-May-02 17:51 UTC

head link

Re: Self-ballooning question / cache issue

> From: Jana Saout [mailto:jana@saout.de]
> Subject: Re: [Xen-devel] Self-ballooning question / cache issue
> 
Hi Jana --

Since you have tested this patch and have found it useful, and
since its use is entirely optional, it is OK with me for it to
be upstreamed at the next window.  Konrad cc''ed.

You will need to add a Signed-off-by line to the patch
but other than that you can consider it

Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
> > Your idea of the tunable is interesting (and patches are always
> > welcome!) but I am skeptical that it will solve the problem
> > since I would guess the Linux kernel is shrinking dcache
> > proportional to the size of the page cache.  So adding more
> > RAM with your "user-specified amount of pages that is
> > added on top of the computed target number of pages",
> > the RAM will still be shared across all caches and only
> > some small portion of the added RAM will likely be used
> > for dcache.
> 
> That''s true.  In fact, I have to add about 1 GB of memory in order
to
> keep the relevant dcache / inode cache entries to stay in the cache.
> When I do that the largest portion of memory is still eaten up by the
> regular page cache.  So this is more of a workaround than a solution,
> but for now it works.
> 
> I''ve attached the simple patch I''ve whipped up below.
> 
> > However, if you have a chance to try it, I would be interested
> > in your findings.  Note that you already can set a
> > permanent floor for selfballooning ("min_usable_mb") or,
> > of course, just turn off selfballooning altogether.
> 
> Sure, that''s always a possibility.  However, the VM already had an
> overly large amount of memory before to avoid the problem.  Now it runs
> with less memory (still a bit more than required), and when a load spike
> comes, it can quickly balloon up, which is exactly what I was looking
> for.
> 
> 	Jana
> 
> ----
> Author: Jana Saout <jana@saout.de>
> Date:   Sun Apr 29 22:09:29 2012 +0200
> 
>     Add selfballoning memory reservation tunable.
> 
> diff --git a/drivers/xen/xen-selfballoon.c b/drivers/xen/xen-selfballoon.c
> index 146c948..7d041cb 100644
> --- a/drivers/xen/xen-selfballoon.c
> +++ b/drivers/xen/xen-selfballoon.c
> @@ -105,6 +105,12 @@ static unsigned int selfballoon_interval __read_mostly
= 5;
>   */
>  static unsigned int selfballoon_min_usable_mb;
> 
> +/*
> + * Amount of RAM in MB to add to the target number of pages.
> + * Can be used to reserve some more room for caches and the like.
> + */
> +static unsigned int selfballoon_reserved_mb;
> +
>  static void selfballoon_process(struct work_struct *work);
>  static DECLARE_DELAYED_WORK(selfballoon_worker, selfballoon_process);
> 
> @@ -217,7 +223,8 @@ static void selfballoon_process(struct work_struct
*work)
>  		cur_pages = totalram_pages;
>  		tgt_pages = cur_pages; /* default is no change */
>  		goal_pages = percpu_counter_read_positive(&vm_committed_as) +
> -				totalreserve_pages;
> +				totalreserve_pages +
> +				MB2PAGES(selfballoon_reserved_mb);
>  #ifdef CONFIG_FRONTSWAP
>  		/* allow space for frontswap pages to be repatriated */
>  		if (frontswap_selfshrinking && frontswap_enabled)
> @@ -397,6 +404,30 @@ static DEVICE_ATTR(selfballoon_min_usable_mb, S_IRUGO
| S_IWUSR,
>  		   show_selfballoon_min_usable_mb,
>  		   store_selfballoon_min_usable_mb);
> 
> +SELFBALLOON_SHOW(selfballoon_reserved_mb, "%d\n",
> +				selfballoon_reserved_mb);
> +
> +static ssize_t store_selfballoon_reserved_mb(struct device *dev,
> +					     struct device_attribute *attr,
> +					     const char *buf,
> +					     size_t count)
> +{
> +	unsigned long val;
> +	int err;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +	err = strict_strtoul(buf, 10, &val);
> +	if (err)
> +		return -EINVAL;
> +	selfballoon_reserved_mb = val;
> +	return count;
> +}
> +
> +static DEVICE_ATTR(selfballoon_reserved_mb, S_IRUGO | S_IWUSR,
> +		   show_selfballoon_reserved_mb,
> +		   store_selfballoon_reserved_mb);
> +
> 
>  #ifdef CONFIG_FRONTSWAP
>  SELFBALLOON_SHOW(frontswap_selfshrinking, "%d\n",
frontswap_selfshrinking);
> @@ -480,6 +511,7 @@ static struct attribute *selfballoon_attrs[] = {
>  	&dev_attr_selfballoon_downhysteresis.attr,
>  	&dev_attr_selfballoon_uphysteresis.attr,
>  	&dev_attr_selfballoon_min_usable_mb.attr,
> +	&dev_attr_selfballoon_reserved_mb.attr,
>  #ifdef CONFIG_FRONTSWAP
>  	&dev_attr_frontswap_selfshrinking.attr,
>  	&dev_attr_frontswap_hysteresis.attr,
> 
>

Konrad Rzeszutek Wilk

2012-May-10 14:42 UTC

head link

Re: Self-ballooning question / cache issue

On Wed, May 02, 2012 at 10:51:12AM -0700, Dan Magenheimer
wrote:> > From: Jana Saout [mailto:jana@saout.de]
> > Subject: Re: [Xen-devel] Self-ballooning question / cache issue
> > 
> 
> Hi Jana --
> 
> Since you have tested this patch and have found it useful, and
> since its use is entirely optional, it is OK with me for it to
> be upstreamed at the next window.  Konrad cc''ed.
> 
> You will need to add a Signed-off-by line to the patch
> but other than that you can consider it
> 
> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Looks good. Can you resend it with the right tags to
xen-devel and lkml and to me please?> 
> > > Your idea of the tunable is interesting (and patches are always
> > > welcome!) but I am skeptical that it will solve the problem
> > > since I would guess the Linux kernel is shrinking dcache
> > > proportional to the size of the page cache.  So adding more
> > > RAM with your "user-specified amount of pages that is
> > > added on top of the computed target number of pages",
> > > the RAM will still be shared across all caches and only
> > > some small portion of the added RAM will likely be used
> > > for dcache.
> > 
> > That''s true.  In fact, I have to add about 1 GB of memory in
order to
> > keep the relevant dcache / inode cache entries to stay in the cache.
> > When I do that the largest portion of memory is still eaten up by the
> > regular page cache.  So this is more of a workaround than a solution,
> > but for now it works.
> > 
> > I''ve attached the simple patch I''ve whipped up
below.
> > 
> > > However, if you have a chance to try it, I would be interested
> > > in your findings.  Note that you already can set a
> > > permanent floor for selfballooning ("min_usable_mb")
or,
> > > of course, just turn off selfballooning altogether.
> > 
> > Sure, that''s always a possibility.  However, the VM already
had an
> > overly large amount of memory before to avoid the problem.  Now it
runs
> > with less memory (still a bit more than required), and when a load
spike
> > comes, it can quickly balloon up, which is exactly what I was looking
> > for.
> > 
> > 	Jana
> > 
> > ----
> > Author: Jana Saout <jana@saout.de>
> > Date:   Sun Apr 29 22:09:29 2012 +0200
> > 
> >     Add selfballoning memory reservation tunable.
> > 
> > diff --git a/drivers/xen/xen-selfballoon.c
b/drivers/xen/xen-selfballoon.c
> > index 146c948..7d041cb 100644
> > --- a/drivers/xen/xen-selfballoon.c
> > +++ b/drivers/xen/xen-selfballoon.c
> > @@ -105,6 +105,12 @@ static unsigned int selfballoon_interval
__read_mostly = 5;
> >   */
> >  static unsigned int selfballoon_min_usable_mb;
> > 
> > +/*
> > + * Amount of RAM in MB to add to the target number of pages.
> > + * Can be used to reserve some more room for caches and the like.
> > + */
> > +static unsigned int selfballoon_reserved_mb;
> > +
> >  static void selfballoon_process(struct work_struct *work);
> >  static DECLARE_DELAYED_WORK(selfballoon_worker, selfballoon_process);
> > 
> > @@ -217,7 +223,8 @@ static void selfballoon_process(struct work_struct
*work)
> >  		cur_pages = totalram_pages;
> >  		tgt_pages = cur_pages; /* default is no change */
> >  		goal_pages = percpu_counter_read_positive(&vm_committed_as) +
> > -				totalreserve_pages;
> > +				totalreserve_pages +
> > +				MB2PAGES(selfballoon_reserved_mb);
> >  #ifdef CONFIG_FRONTSWAP
> >  		/* allow space for frontswap pages to be repatriated */
> >  		if (frontswap_selfshrinking && frontswap_enabled)
> > @@ -397,6 +404,30 @@ static DEVICE_ATTR(selfballoon_min_usable_mb,
S_IRUGO | S_IWUSR,
> >  		   show_selfballoon_min_usable_mb,
> >  		   store_selfballoon_min_usable_mb);
> > 
> > +SELFBALLOON_SHOW(selfballoon_reserved_mb, "%d\n",
> > +				selfballoon_reserved_mb);
> > +
> > +static ssize_t store_selfballoon_reserved_mb(struct device *dev,
> > +					     struct device_attribute *attr,
> > +					     const char *buf,
> > +					     size_t count)
> > +{
> > +	unsigned long val;
> > +	int err;
> > +
> > +	if (!capable(CAP_SYS_ADMIN))
> > +		return -EPERM;
> > +	err = strict_strtoul(buf, 10, &val);
> > +	if (err)
> > +		return -EINVAL;
> > +	selfballoon_reserved_mb = val;
> > +	return count;
> > +}
> > +
> > +static DEVICE_ATTR(selfballoon_reserved_mb, S_IRUGO | S_IWUSR,
> > +		   show_selfballoon_reserved_mb,
> > +		   store_selfballoon_reserved_mb);
> > +
> > 
> >  #ifdef CONFIG_FRONTSWAP
> >  SELFBALLOON_SHOW(frontswap_selfshrinking, "%d\n",
frontswap_selfshrinking);
> > @@ -480,6 +511,7 @@ static struct attribute *selfballoon_attrs[] = {
> >  	&dev_attr_selfballoon_downhysteresis.attr,
> >  	&dev_attr_selfballoon_uphysteresis.attr,
> >  	&dev_attr_selfballoon_min_usable_mb.attr,
> > +	&dev_attr_selfballoon_reserved_mb.attr,
> >  #ifdef CONFIG_FRONTSWAP
> >  	&dev_attr_frontswap_selfshrinking.attr,
> >  	&dev_attr_frontswap_hysteresis.attr,
> > 
> > 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Xen devel - Apr 2012 - Self-ballooning question / cache issue

Self-ballooning question / cache issue

Re: Self-ballooning question / cache issue

Re: Self-ballooning question / cache issue

Re: Self-ballooning question / cache issue

Re: Self-ballooning question / cache issue