Whit Blauvelt
2014-Apr-06 19:33 UTC
[Gluster-users] How should ps's VSZ and RSS be interpreted for Gluster?
Hi, I've got a smallish Gluster 3.1.5 instance across two systems with most of the service being by NFS mounts. Yeah, that's old. But it's generally stable and there are other priorities ahead of upgrading it. Recently it started to lag on directory listings on the box providing Gluster's NFS mounts. The current %CPU and %MEM readings in ps looked modest, but the VSZ and RSS (aka VIRT and REZ in htop) readings were way into what looked like impossibly high figures. Restarting Gluster brought those down, and I hope will turn out to fix the sluggish directory listings (seems to, but they were intermittent). Trying to understand this better, I've found this article: http://emilics.com/blog/article/mconsumption.html So I have a rough idea, but I'm still not entirely clear on what it means when over many months those VSZ and RSS values go way up for a process, and what defines sanity for those processes for Gluster, as compared to the danger zone where I really had better restart the thing. If someone can suggest a rule-of-thumb way to calculate the threshold of insanity for these, I'll probably write up a simple Nagios plugin to watch for that being approached, to remind me to restart Gluster before the users start complaining again. Thanks, Whit