Hi, I?ve come across a problem with the ?daily? security job. On an overloaded system with lots of ZFS datasets, lots of files, heavy system load and, to add insult to injury, a ZFS crub going on the find?s issued by the periodic checks can take forever. They can take so long, I have found several lockf?s waiting. Is it sane to have an unlimited timeout for lockf? Probably it would be better to have at least a configurable timeout for each cathegory. It?s really unlikely to see an overlap for a weekly or monthly job, but for daily jobs it would be good to have a sane default, say, an hour or two. There?s even a parameter on /etc/defaults/periodic.conf but it seems it?s not used right now. # Max time to sleep to avoid causing congestion on download servers anticongestion_sleeptime=3600 The alternative would be to have defaults for a sane timeout for each cathegory, like daily_lockf_timeout weekly_lockf_timeout monthly_lockf_timeout Thoughts? It?s pretty simple to do and overlapping periodic jobs are really useless. Borja.
On Tue, Oct 24, 2017 at 3:07 AM, Borja Marcos <borjam at sarenet.es> wrote:> > Hi, > > I?ve come across a problem with the ?daily? security job. On an overloaded system with lots of ZFS datasets, > lots of files, heavy system load and, to add insult to injury, a ZFS crub going on the find?s issued by the > periodic checks can take forever. They can take so long, I have found several lockf?s waiting. > > Is it sane to have an unlimited timeout for lockf? Probably it would be better to have at least a configurable > timeout for each cathegory. It?s really unlikely to see an overlap for a weekly or monthly job, but for daily > jobs it would be good to have a sane default, say, an hour or two. > > There?s even a parameter on /etc/defaults/periodic.conf but it seems it?s not used right now. > > # Max time to sleep to avoid causing congestion on download servers > anticongestion_sleeptime=3600 > > > The alternative would be to have defaults for a sane timeout for each cathegory, like > > daily_lockf_timeout > weekly_lockf_timeout > monthly_lockf_timeout > > Thoughts? It?s pretty simple to do and overlapping periodic jobs are really useless.Are you talking about the lockf in /usr/sbin/periodic? It already has a timeout of 0, which should prevent overlapping periodic jobs. Or is there some other lockf involved? Without knowing which lockf you're talking about, I can't understand your problem. The anticongestion_sleeptime variable is unrelated to lockf. -Alan
> On 24 Oct 2017, at 16:41, Alan Somers <asomers at freebsd.org> wrote: > > On Tue, Oct 24, 2017 at 3:07 AM, Borja Marcos <borjam at sarenet.es> wrote: > Are you talking about the lockf in /usr/sbin/periodic? It already has > a timeout of 0, which should prevent overlapping periodic jobs. Or is > there some other lockf involved? Without knowing which lockf you're > talking about, I can't understand your problem.Sorry, my explanation was awful now that I read it again. Yes, I mean the lockf in /usr/sbin/periodic. And no, I didn?t mean that jobs overlap (certainly they don?t thanks to the lockf) but they can pile up. Today I had a machine with three daily jobs waiting to start because the first one had been running for four days (a combination of lots of files and datasets, heavy system load, ZFS pool almost full?) The problem with a timeout of 0 is that it?s unlimited. In case something is wrong you can end up with a growing queue of daily periodic jobs waiting to run. Imagine you have a very high system load for several days and for some reason the daily job won?t complete. Next day a new daily job will try to start but it will have to wait for the first one to finish. And so on. The proposal is to replace the ?0? timeout for lockf with a sane timeout so that it will attempt to run it, but give up in case it can?t be done in a reasonable time. The timeout shouldn?t be long actually. If periodic must wait in order to start a job it means that you have a serious performance problem and it?s pointless to keep your machine doing ?find? 24/7. Given the nature of the periodic jobs I don?t think it should be a problem to attempt to run them in a best effort basis rather than guaranteing that they will eventually even if awfully late. I would add a configurable timeout for /usr/sbin/periodic. I think it?s better done with a different variable for each class and their default values can be 0 so that nothing changes. daily_start_timeout weekly_start_timeout monthly_start_timeout> The anticongestion_sleeptime variable is unrelated to lockf.Understood, I stand corrected. I assumed it was. Hope it?s better now. It?s pretty easy to do but I?m interested on the opinions on this matter :) Thank you! Borja.