I'm using dsync for a regular backup. The backup system flocks so that two cannot run at the same time, which is generally a good thing. The problem is that it seems like dsync sometimes goes off into the weeds and never comes back, leaving a process running and doing nothing forever, hogging the lock and causing my backups never to run again. I just finally figured out that was what was causing the backups not to run on this system was this process: root 17836 0.0 0.0 40888 1600 ? S 2012 0:00 ssh -i /root/.ssh/backmaildir_id_rsa backmaildir at arg /usr/bin/dsync -u foobar server yeah, that has been running since 2012 :( root:/tmp# strace -p 17836 Process 17836 attached - interrupt to quit select(8, [4], [], NULL, NULL very exciting... There doesn't seem to be a timeout in dsync, but perhaps there should be? At this point my only option is to write a cronjob that will look for dsync processes that are over a certain amount of time old and then kill them, after I do that I will need to take a shower because that is a very dirty solution :P thanks for any ideas, or help! micah --
On 31.1.2013, at 0.06, Micah Anderson <micah at riseup.net> wrote:> I'm using dsync for a regular backup. The backup system flocks so that > two cannot run at the same time, which is generally a good thing. The > problem is that it seems like dsync sometimes goes off into the weeds > and never comes back, leaving a process running and doing nothing > forever, hogging the lock and causing my backups never to run again. I > just finally figured out that was what was causing the backups not to > run on this system was this process: > > root 17836 0.0 0.0 40888 1600 ? S 2012 0:00 ssh -i /root/.ssh/backmaildir_id_rsa backmaildir at arg /usr/bin/dsync -u foobar server > > yeah, that has been running since 2012 :(So that's the ssh process. What about the dsync process that started it? Does/did it exist?> There doesn't seem to be a timeout in dsync, but perhaps there should > be? At this point my only option is to write a cronjob that will look > for dsync processes that are over a certain amount of time old and then > kill them, after I do that I will need to take a shower because that is > a very dirty solution :PThere is a 15 minute timeout in dsync after which it stops itself. Normally the child process should also die.. v2.2 now will make sure that the child process dies: http://hg.dovecot.org/dovecot-2.2/rev/070ca24e5846