thr3ads.net - dovecot - [Dovecot] 1.0rc8 status report [Oct 2006]

If this information is useful, please help other people find it:
Share via:

David Lee

2006-Oct-11 14:54 UTC

[Dovecot] 1.0rc8 status report

A quick status report on how 1.0rc8 behaved in service for a few hours
with several hundred simultaneous users, at a site very new to dovecot.

Oh, and a question at the end.

Summary: Reasonable for a first shot but one significant problem,
requiring backing off.


Background:

We have a long-established UW-IMAP service for a user population of about
20,000 based on a few Linux (Redhat) machines running IMAP/POP and inbound
delivery.  We try to ensure, but cannot guarantee, that all activity for a
given user takes place within one machine.  Each machine mounts the INBOX
area ("/var/spool/mail"; traditional UNIX mbox) via NFS with tight NFS
arguments ("noac,actimeo=0", etc.) and similarly mounts the users'
folder
areas which are subdirs of their home directories.  (We know that Mark
Crispin recommends against NFS for UW-IMAP, but we seem to have been OK.)

There is also some processing:  .forward->"|
procmail"->folder-or-inbox

Each machine typically has several hundred simultaneous IMAP connections.

This has basically worked well, but the UW-IMAP loading has been heavy.



The plan:

In an ideal world, I would like to restructure the above.  But our world
is not ideal, so we have to stay with the structure.  But we are looking
for a transparent (user perspective) migration to dovecot.



The dovecot experience:

Yesterday, I quietly adjusted a DNS entry to redirect one of the live
email hostnames at an additional machine in the "farm", running
dovecot
1.0rc8, including deliver/LDA (and taking into account some post-rc7
dovecot changes in this area).

On an earlier, smaller-scale test, one problem had been some periods of
"temporary authentication failures".  Increasing
"login_processes_count"
and "login_max_processes_count" (each by a factor of 8) seems to have
fixed this, and I'm not aware of any problems in that area yesterday.

It basically went well.  But just over two hours hours later I had to back
off, because of a significant dovecot problem, namely that dovecot
crashed, almost silently.  The only traces of this event in the log file
seem to be:
   Oct 10 16:26:12 [...] dovecot: child 24525 (login) returned error 89
   Oct 10 16:26:14 [...] dovecot: Login process died too early - shutting down

Any thoughts?  Any fixes?  If the problem needs debugging (or additional
data/log collection) how might that be attempted in this environment?



-- 

:  David Lee                                I.T. Service          :
:  Senior Systems Programmer                Computer Centre       :
:                                           Durham University     :
:  http://www.dur.ac.uk/t.d.lee/            South Road            :
:                                           Durham DH1 3LE        :
:  Phone: +44 191 334 2752                  U.K.                  :

Timo Sirainen

2006-Oct-11 15:26 UTC

head link

[Dovecot] 1.0rc8 status report

On Wed, 2006-10-11 at 15:54 +0100, David Lee wrote:> It basically went well.  But just over two hours hours later I had to back
> off, because of a significant dovecot problem, namely that dovecot
> crashed, almost silently.  The only traces of this event in the log file
> seem to be:
>    Oct 10 16:26:12 [...] dovecot: child 24525 (login) returned error 89
>    Oct 10 16:26:14 [...] dovecot: Login process died too early - shutting
down
Looks like this is happening to some people now.. Unfortunately I can't
really do anything with this little information. There's a bug in
logging, which I think is fixed by this patch:

http://dovecot.org/list/dovecot-cvs/2006-October/006473.html

After knowing what exactly the error is I could debug it further.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL:
<http://dovecot.org/pipermail/dovecot/attachments/20061011/5b4a7daf/attachment.bin>

Bill Boebel

2006-Oct-11 21:47 UTC

head link

[Dovecot] 1.0rc8 status report

On Wed, October 11, 2006 11:26 am, Timo Sirainen <tss at iki.fi> said:
> On Wed, 2006-10-11 at 15:54 +0100, David Lee wrote:
>> It basically went well.  But just over two hours hours later I had to
back
>> off, because of a significant dovecot problem, namely that dovecot
>> crashed, almost silently.  The only traces of this event in the log
file
>> seem to be:
>>    Oct 10 16:26:12 [...] dovecot: child 24525 (login) returned error 89
>>    Oct 10 16:26:14 [...] dovecot: Login process died too early -
shutting down
> 
> Looks like this is happening to some people now.. Unfortunately I can't
> really do anything with this little information.
> 
I was seeing this too, with the CVS version just before RC8.  I believe that
when login_max_processes_count is reached (the default is 128) Dovecot is
crashing rather than handling that condition semi gracefully.

The reason I think this is because we have a cluster of servers that do search
indexing for our IMAP mailboxes, and due to a bug this week, those servers
started slamming one of our IMAP servers with rapid login failures.  Dovecot
crashed frequently with "Login process died too early - shutting down"
while this was occurring.  And then also under normal operation we were seeing
Dovecot crash about once per day on random servers when that server was under a
heavy IMAP load.

To work around this we switched to "login_process_per_connection = no"
and bumped up our file descritor limit ("ulimit -n").  No crashes
since.  But probably still needs to be fixed in the code.

Bill

Maybe Matching Threads

Search for more maybe matching threads

dovecot - Oct 2006 - 1.0rc8 status report

[Dovecot] 1.0rc8 status report

[Dovecot] 1.0rc8 status report

[Dovecot] 1.0rc8 status report

Maybe Matching Threads