> On 17 February 2016, at 17:45, Lowell Gilbert <freebsd-stable-local at
be-well.ilk.org> wrote:
>
> Doug Hardie <bc979 at lafn.org> writes:
>
>>> On 17 February 2016, at 16:50, Lowell Gilbert
<freebsd-stable-local at be-well.ilk.org> wrote:
>>>
>>> Have you measured that paging (not swapping; that's a more
extreme
>>> measure where the whole process gets removed from memory) is a
>>> significant load on your system in a specific case? If not, I doubt
that
>>> it's actually the case, and you're mitigating a
non-existent problem
>>
>> I believe the question here is what is using up the swap space. From
>> what I have been able to find with a similar situation is that malloc
>> will allocate swap space to backup memory and mmap will also allocate
>> swap if there is no backing file. procstat -v can be helpful in
>> chasing down some of those issues. However, I ended up guessing which
>> process it was by sequentially restarting processes and watching
>> swapinfo. I still have not been able to chase down what in that
>> process is using the space. There are no mmaps that are not file
>> backed so it must be a malloc. Finding the right one has been
>> elusive.
>
> Sure, but I'm pretty sure that the other worriers in this thread
don't
> actually have any problem at all. I tried to poke a (Socratically
> limited) number of questions as a start of figuring out whether that's
> really the case, but thus far, I'd bet that it is. If that turns out to
> be a losing bet, I *will* spend time on fixing the code.
>
> Your observations are more useful, but I'm still not sure they indicate
> a problem that needs to be solved. There are clearly cases where
> significant quantities of swap can get used up storing copies of clean
> pages backing files on disk. Unless that slows down bringing in new
> pages that need to be read or written, I don't think that's a
problem.
Well, the problem is quite significant for me in that eventually the system runs
out of swap and starts killing processes. Its not quite random, but I
haven't spent much time trying to figure out how it selects those to kill.
The specific system unfortunately is remote (about a 3 hour drive) and when sshd
gets killed, I have no option other than having someone go on site to reboot it.
I have had to start monitoring swap usage with nagios and having it notifiy me
when swap is at 40% used. So far that has given me enough time to find an
internet connection and restart the process that is eating the swap. The
developer of that process claims that the problem must be in one of the library
functions it uses. That does seem reasonable as I am using that process on a
large number of systems and only one has the issue. However, until I can track
down which system call or malloc is eating up swap space, I don't really see
any way to fix the problem. I recently rebuilt that system with an updated
system and resized swap to be 10x memory. That does raise the mean time between
process kills, but doesn't eliminate the problem. About every other week
the alarm is raised now. Before it was pretty much every other day.