JB Zimmerman
2006-Nov-01 21:28 UTC
[Dovecot] CRASH: mail-cache-fields.c crash - new info, hacked 'solution'
I'm baaaaack. :-) I've managed to implement a suggestion from Hans Morten Kind from this list that seems to have stopped the crashing. However, my hack - commenting out a call to i_unreached() - makes me queasy because I have no idea of the ramifications of it (I don't habitually code, myself). So I wanted to lay it out for y'all in case this is a problem that you feel should be looked at. So, here's my situation. Using dovecot-1.0.rc10 downloaded from dovecot.org. Built an RPM locally on my machine (as opposed to prior attempts, which used the AT rpms), which is running RHEL 4AS with all updates. Did *not* configure in postgres, mysql, sqlite, ldap-auth. Other than that, stock (openssl included, eg) - some file locations taken from AT RPMs' spec (redhat specific file locs). No patches performed. SPEC available if y'all think it'd help; the RPM built with no complaints, installed same. Using Maildir format, upgraded from a Courier install, so .folder.subfolder structure. Error behavior: When a user attempted to open a folder containing large numbers of messages (roughly 100k+ messages, as far as we can tell) they immediately got an error saying the server has disconnected. On the server side, I got this in the log (hostname 'magneto' obviously) ---cut--- Nov 1 15:18:16 magneto dovecot: IMAP(joeuser): file mail-cache-fields.c: line 26: unreached Nov 1 15:18:16 magneto dovecot: child 17599 (imap) killed with signal 6 ---cut--- Now, the folder in question is a folder of CVS commit messages (hence the size). If I go into the folder ("/home/joeuser/Maildir/.GNOME CVS commits/") and do 'rm -f dovecot-*' and then have the user try again, then they can open the folder and get a message list. dovecot will rebuild the various index files. However, as soon as they click on an individual message, bam, the same error behavior - and from then on, they can't get into the folder again unless we remove their dovecot files again. We tried this using Evolution, mutt and pine as the clients. All exhibited identical behavior. This is coming over TLS. NOW THE FIX: I made a change to the source (gasp!) that honestly I have no idea the ramifications of, but it has...well, not *fixed*, but sorta fixed it. As per Hans Morten Kind, I commented out the i_unreached() call in field_has_fixed_size(). After this, the mail is readable as is the folder list, but now there is an error message in the log. First things first, here's the change I made to dovecot-1.0.rc10/src/lib-index/mail-cache-fields.c: ---cut--- @@ -23,7 +23,7 @@ return FALSE; } - i_unreached(); +/* i_unreached(); */ return FALSE; } ---cut--- ...and here's what now happens in the log, to the same mail folder as above: ---cut--- Nov 1 15:59:42 magneto dovecot: IMAP(joeuser): Corrupted index cache file /home/joeuser/Maildir/.GNOME CVS commits/dovecot.index.cache: field header names corrupted ---cut--- At that point, I deleted the cache files again, and the error goes away. I also notice that the index.cache file in that folder is much, much larger than it was, from which I posit that the above error was because the crashing imap process had left an incomplete index file. Removing it thus forced a rebuild with the new code which seems to have fixed the problem. Thank you all for your patience. I hand this willingly over to the list. jb -- ------------------------------------ J.B. Zimmerman jbz at ximian.com Network Administrator Ximian - http://www.ximian.com ...a tiny little division of Novell. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://dovecot.org/pipermail/dovecot/attachments/20061101/d0278da5/attachment.pgp