thr3ads.net - CentOS - [CentOS] Effectiveness of CentOS vm.swappiness [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Markus "Shorty" Uckelmann

2015-Jun-05 19:09 UTC

[CentOS] Effectiveness of CentOS vm.swappiness

Am 05.06.2015 um 18:33 schrieb Gordon Messmer:> On 06/05/2015 03:29 AM, Markus "Shorty" Uckelmann wrote:
>> some (probably unused) parts are swapped out. But, some of
>> those parts are the salt-minion, php-fpm or mysqld. All services which
>> are important for us and which suffer badly from being swapped out.
>
> Those two things can't really both be true.  If the pages swapped out
> are unused, then the application won't suffer as a result.
Why not? If you have an application which sees action only every 12 to 
24 hours,I think this can happen. Our salt-minion would be a candidate 
for this. Allthough we constantly check if it's alive, we only do once 
or twice a day something "heavy" like a deployment. And very often we 
have to run thos deployments twice, because the first time we get a lot 
of timeouts. Sure, it might be the software itself. But I think it could 
be possible that it is because of swapped out pages.

I can't be sure about this. That's why I want to find out what and why 
things are happening. But first I need to find the right tools to do this ;)

Cheers, Shorty

Gordon Messmer

2015-Jun-05 21:32 UTC

head link

[CentOS] Effectiveness of CentOS vm.swappiness

On 06/05/2015 12:09 PM, Markus "Shorty" Uckelmann
wrote:> Am 05.06.2015 um 18:33 schrieb Gordon Messmer:
>> On 06/05/2015 03:29 AM, Markus "Shorty" Uckelmann wrote:
>>> some (probably unused) parts are swapped out. But, some of
>>> those parts are the salt-minion, php-fpm or mysqld. All services
which
>>> are important for us and which suffer badly from being swapped out.
>>
>> Those two things can't really both be true.  If the pages swapped
out
>> are unused, then the application won't suffer as a result.
>
> Why not? If you have an application which sees action only every 12 to
> 24 hours,I think this can happen.
Well, that's not "unused," then.

To measure the swap use of your processes, install "smem".  It will
show
you the amount of swap that each process is using.

For more specific information, make a copy of /proc/<pid>/smaps.

To quantify your problem, let bacula run then save the output of smem, 
or /proc/<pid>/smaps for each of your critical services, or both, and 
then access each of the services and quantify the latency relative to 
the normal latency.  Finally, after collecting latency information, get 
the output of smem and/or /proc/<pid>/smaps again.  You can compare swap 
use before and after accessing the service to see how much was swapped 
out beforehand (presumably because of the backup), and how much had to 
be recovered for your test query.

I'd suggest collecting that information at the normal swappiness setting 
and at 0.

If the kernel is swapping out processes in favor of filesystem cache 
when swappiness is 0, I believe that would be a bug, and should be 
reported to the kernel developers.
> Our salt-minion would be a candidate
> for this. Allthough we constantly check if it's alive, we only do once
> or twice a day something "heavy" like a deployment. And very
often we
> have to run thos deployments twice, because the first time we get a lot
> of timeouts. Sure, it might be the software itself. But I think it could
> be possible that it is because of swapped out pages.
"Timeouts" is pretty vague.  Very generally, it's possible that
you have
a timeout configured somewhere that is failing on the first run because 
the filesystem cache now contains content from your backup, and your 
process only completes in time when the files needed for the deployment 
are in the filesystem cache.  That's a stretch as far as explanations 
go, but if that is the case, then swappiness isn't going to fix the 
problem.  You need to fix your timeout so that it allows enough time for 
the deployment to finish when the server is cold booted (using no 
cache), or prime your caches before doing deployments.

Markus "Shorty" Uckelmann

2015-Jun-06 09:23 UTC

head link

[CentOS] Effectiveness of CentOS vm.swappiness

Am 05.06.2015 um 23:32 schrieb Gordon Messmer:>>> Those two things can't really both be true.  If the pages
>>> swapped out are unused, then the application won't suffer as a
>>> result.
>>
>> Why not? If you have an application which sees action only every
>> 12 to 24 hours,I think this can happen.
>
> Well, that's not "unused," then.
In a matter of speaking it's not "unused". But in the case of
"rarely
used" it is possible that parts of the programm are in swap which are
needed.
> To measure the swap use of your processes, install "smem".  It
will
> show you the amount of swap that each process is using.
Brilliant! Until now I was using the script under [1].
> For more specific information, make a copy of /proc/<pid>/smaps.
>
> To quantify your problem, let bacula run then save the output of
> smem, or /proc/<pid>/smaps for each of your critical services, or
> both, and then access each of the services and quantify the latency
> relative to the normal latency.  Finally, after collecting latency
> information, get the output of smem and/or /proc/<pid>/smaps again.
> You can compare swap use before and after accessing the service to
> see how much was swapped out beforehand (presumably because of the
> backup), and how much had to be recovered for your test query.
>
> I'd suggest collecting that information at the normal swappiness
> setting and at 0.
Thank you, this will get me further.
> If the kernel is swapping out processes in favor of filesystem
> cache when swappiness is 0, I believe that would be a bug, and
> should be reported to the kernel developers.
Because of what I read in [2] I'm not planning to use 0, rather 1. But
please correct me if I'm wrong.
> "Timeouts" is pretty vague.  Very generally, it's possible
that you
> have a timeout configured somewhere that is failing on the first
> run because the filesystem cache now contains content from your
> backup, and your process only completes in time when the files
> needed for the deployment are in the filesystem cache.  That's a
> stretch as far as explanations go, but if that is the case, then
> swappiness isn't going to fix the problem.  You need to fix your
> timeout so that it allows enough time for the deployment to finish
> when the server is cold booted (using no cache), or prime your
> caches before doing deployments.
With timeouts I meant that the salt master tries to contact the
salt-minion to send it the payload. At this point happens the timeout.
In this case it means that the minion doesn't get back to the master
in the configured timeout. Currently it's set to 20 seconds. When we
start a job the first time after several hours we get a lot of
timeouts. A second run mostly helps. I think it is possible that parts
of the minion process which are needed for the payload we send it are
swapped out. At the first run it takes too long to get the pages back
into RAM. But eventually all pages are paged in. So the second run
works. But this is just an assumption. On Monday I'll try to find out
if I'm right or wrong.

BTW: Is there a way to find out which parts of a programm are swapped
out without using monsters like Valgrind? Damn, sounds like an
interesting start of the week...


[1] http://northernmost.org/blog/find-out-what-is-using-your-swap/
[2]
http://www.mysqlperformanceblog.com/2014/04/28/oom-relation-vm-swappiness0-new-kernel/


Cheers to all for the feedback and help,
Shorty

Possibly Parallel Threads

Search for more apparently analagous threads

CentOS - Jun 2015 - Effectiveness of CentOS vm.swappiness

[CentOS] Effectiveness of CentOS vm.swappiness

[CentOS] Effectiveness of CentOS vm.swappiness

[CentOS] Effectiveness of CentOS vm.swappiness

Possibly Parallel Threads