> On 24 Oct 2017, at 16:41, Alan Somers <asomers at freebsd.org> wrote: > > On Tue, Oct 24, 2017 at 3:07 AM, Borja Marcos <borjam at sarenet.es> wrote: > Are you talking about the lockf in /usr/sbin/periodic? It already has > a timeout of 0, which should prevent overlapping periodic jobs. Or is > there some other lockf involved? Without knowing which lockf you're > talking about, I can't understand your problem.Sorry, my explanation was awful now that I read it again. Yes, I mean the lockf in /usr/sbin/periodic. And no, I didn?t mean that jobs overlap (certainly they don?t thanks to the lockf) but they can pile up. Today I had a machine with three daily jobs waiting to start because the first one had been running for four days (a combination of lots of files and datasets, heavy system load, ZFS pool almost full?) The problem with a timeout of 0 is that it?s unlimited. In case something is wrong you can end up with a growing queue of daily periodic jobs waiting to run. Imagine you have a very high system load for several days and for some reason the daily job won?t complete. Next day a new daily job will try to start but it will have to wait for the first one to finish. And so on. The proposal is to replace the ?0? timeout for lockf with a sane timeout so that it will attempt to run it, but give up in case it can?t be done in a reasonable time. The timeout shouldn?t be long actually. If periodic must wait in order to start a job it means that you have a serious performance problem and it?s pointless to keep your machine doing ?find? 24/7. Given the nature of the periodic jobs I don?t think it should be a problem to attempt to run them in a best effort basis rather than guaranteing that they will eventually even if awfully late. I would add a configurable timeout for /usr/sbin/periodic. I think it?s better done with a different variable for each class and their default values can be 0 so that nothing changes. daily_start_timeout weekly_start_timeout monthly_start_timeout> The anticongestion_sleeptime variable is unrelated to lockf.Understood, I stand corrected. I assumed it was. Hope it?s better now. It?s pretty easy to do but I?m interested on the opinions on this matter :) Thank you! Borja.
On Tue, 2017-10-24 at 17:06 +0200, Borja Marcos wrote:> > > > > On 24 Oct 2017, at 16:41, Alan Somers <asomers at freebsd.org> wrote: > > > > On Tue, Oct 24, 2017 at 3:07 AM, Borja Marcos <borjam at sarenet.es> wrote: > > Are you talking about the lockf in /usr/sbin/periodic???It already has > > a timeout of 0, which should prevent overlapping periodic jobs.??Or is > > there some other lockf involved???Without knowing which lockf you're > > talking about, I can't understand your problem. > Sorry, my explanation was awful now that I read it again. Yes, I mean the lockf in /usr/sbin/periodic. And > no, I didn?t mean that jobs overlap (certainly they don?t thanks to the lockf) but they can pile up. Today I had > a machine with three daily jobs waiting to start because the first one had been running for four days (a combination > of lots of files and datasets, heavy system load, ZFS pool almost full?)? > > The problem with a timeout of 0 is that it?s unlimited.No, lockf -t 0 means to exit without waiting, with status EX_TEMPFAIL, if the lock cannot be acquired immediately. ?In light of that, the rest of your report/request doesn't make sense. ?Jobs won't stack up, they'll fail if the prior one is still running. -- Ian> In case something is wrong you can end up with a growing queue of > daily periodic jobs waiting to run. Imagine you have a very high system load for several days and for some reason the daily job > won?t complete. Next day a new daily job will try to start but it will have to wait for the first one to finish. And so on. > > The proposal is to replace the ?0? timeout for lockf with a sane timeout so that it will attempt to run it, but give up in > case it can?t be done in a reasonable time. The timeout shouldn?t be long actually. If periodic must wait in order to > start a job it means that you have a serious performance problem and it?s pointless to keep your machine doing ?find? > 24/7. > > Given the nature of the periodic jobs I don?t think it should be a problem to attempt to run them in a best effort basis > rather than guaranteing that they will eventually even if awfully late. > > I would add a configurable timeout for /usr/sbin/periodic. I think it?s better done with a different variable for each? > class and their default values can be 0 so that nothing changes. > > daily_start_timeout > weekly_start_timeout > monthly_start_timeout > > > > > > > The anticongestion_sleeptime variable is unrelated to lockf. > Understood, I stand corrected. I assumed it was.? > > Hope it?s better now. It?s pretty easy to do but I?m interested on the opinions on this matter :) > > > Thank you! > > > > > > Borja.
> On 24 Oct 2017, at 17:25, Ian Lepore <ian at freebsd.org> wrote: > > No, lockf -t 0 means to exit without waiting, with status EX_TEMPFAIL, > if the lock cannot be acquired immediately. In light of that, the rest > of your report/request doesn't make sense. Jobs won't stack up, > they'll fail if the prior one is still running.True! Then I don?t understand what happened. When I saw several processes waiting I assumed it was an unlimited wait. My apologies, I need to look into it more closely. Borja.