Dovecot (v1.1.rc8) died tonight, with an error about time moving backwards by 4398 seconds. I can see from logs that this has happend a few times before with the imap processes, without me noticing. I sure noticed the master process missing, though :-). I was puzzled that it was always 4398 seconds, in particular because this server runs an NTP daemon. A little searching for this problem shows that it is an issue with the Linux kernel gettimeofday(), see e.g. http://lkml.org/lkml/2007/8/23/96 Below is a patch (untested) to work around this issue. Do you see something wrong with this approach, apart from the uglyness? I just picked the 4395-4400 values by chance. Can you figure out how big the window should be? Thanks, Anders. --- ./src/lib/ioloop.c-orig 2008-06-20 10:45:54.000000000 +0200 +++ ./src/lib/ioloop.c 2008-06-20 10:47:36.000000000 +0200 @@ -230,8 +230,13 @@ struct timeval tv, tv_call; unsigned int t_id; - if (gettimeofday(&ioloop_timeval, &ioloop_timezone) < 0) - i_fatal("gettimeofday(): %m"); + /* The Linux gettimeofday() will sometimes jump forward + * by approximately 4398 seconds. Ignore that reading. */ + do { + if (gettimeofday(&ioloop_timeval, &ioloop_timezone) < 0) + i_fatal("gettimeofday(): %m"); + } while (4395 < (ioloop_timeval.tv_sec - ioloop_time) + && (ioloop_timeval.tv_sec - ioloop_time) < 4400); /* Don't bother comparing usecs. */ if (ioloop_time > ioloop_timeval.tv_sec) {
On Fri, 2008-06-20 at 10:53 +0200, Anders wrote:> Dovecot (v1.1.rc8) died tonight, with an error about time moving > backwards by 4398 seconds. I can see from logs that this has happend a > few times before with the imap processes, without me noticing. I sure > noticed the master process missing, though :-). > > I was puzzled that it was always 4398 seconds, in particular because > this server runs an NTP daemon. A little searching for this problem > shows that it is an issue with the Linux kernel gettimeofday(), see > e.g. http://lkml.org/lkml/2007/8/23/96The thread puts it down to buggy hardware and puts a workaround into the kernel where it belongs, not in dovecot. johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20080620/a143803c/attachment-0002.bin>
Johannes Berg <johannes at sipsolutions.net> writes:> On Fri, 2008-06-20 at 10:53 +0200, Anders wrote: >> >> I was puzzled that it was always 4398 seconds, in particular because >> this server runs an NTP daemon. A little searching for this problem >> shows that it is an issue with the Linux kernel gettimeofday(), see >> e.g. http://lkml.org/lkml/2007/8/23/96 > > The thread puts it down to buggy hardware and puts a workaround into the > kernel where it belongs, not in dovecot.That's not helpful. By that line, the entire "time moved backwards" thing does not belong in Dovecot. Anyway, I was not proposing the patch to be included, just asking for advice as to whether it would be safe. I even noted that it was ugly. As I am already compiling Dovecot myself, I prefer a patch there, rather than diverting from the distribution kernel. Cheers, Anders.
On Fri, 2008-06-20 at 10:53 +0200, Anders wrote:> Dovecot (v1.1.rc8) died tonight, with an error about time moving > backwards by 4398 seconds. I can see from logs that this has happend a > few times before with the imap processes, without me noticing. I sure > noticed the master process missing, though :-). > > I was puzzled that it was always 4398 seconds, in particular because > this server runs an NTP daemon. A little searching for this problem > shows that it is an issue with the Linux kernel gettimeofday(), see > e.g. http://lkml.org/lkml/2007/8/23/96 > > Below is a patch (untested) to work around this issue. Do you see > something wrong with this approach, apart from the uglyness?Only problem I can see is that if there's a legitimate jump of 4395 seconds it'll busy-loop for 5 seconds before continuing. Probably not very likely to happen. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20080620/09f32679/attachment-0002.bin>