I am trying to track down an mbox corruption issue that has been happening for one of our users. It appears that the mbox file is getting truncated and the all important From line no longer exists which then leads to the mbox being useless. It also appears, as in most truncations, that parts of the mbox are being lost. I am not sure how to debug this to find out whether this is a client or server issue and any pointers would be appreciated. Here are the facts as I have them: Server: RHEL5 x64/ Dovecot 1.2.8 Client: Win/ Thunderbird 2.0.0.23 This appears to only be happening for one user out of several hundred across 30 or so Dovecot servers, so I am leaning towards this being a client issue, but I don't know a great deal about the low level working of IMAP to debug. For example, I assume that the client issues a request to the server on the order of delete message 34 and the server then overwrites the mbox file (this then would point to it being a server issue)? The corruption occurs in a random assortment of this users files, so sometimes the inbox, others the Trash or one of many other folders this user has. The user also reports that before the corruption occurs duplicates of e-mail will start to appear in folders, even folders that have been unused for a long period of time. Oh I suppose I should also mention that this is about the third or fourth time this has happened to this user and it seems to happen about every 2-3 weeks. Does any of this sound familiar to anyone? Any suggestions as to how to root this out? Thanks, -Erinn
On 12/7/2009, Erinn Looney-Triggs (erinn.looneytriggs at gmail.com) wrote:> Does any of this sound familiar to anyone? Any suggestions as to how > to root this out?dovecot 0.99.x versions were notorious for this, but haven't heard of anything like this for recent versions. So, this farm of 30 dovecot servers - have you tried to nail down whether it happens to this user only when they are accessing through one server? And are you certain that all 30 servers are running 1..2.8, and configured the same? -- Best regards, Charles
> On 12/7/2009, Erinn Looney-Triggs (erinn.looneytriggs at gmail.com) wrote: >> Does any of this sound familiar to anyone? Any suggestions as to how >> to root this out? > > dovecot 0.99.x versions were notorious for this, but haven't heard of > anything like this for recent versions. > > So, this farm of 30 dovecot servers - have you tried to nail down > whether it happens to this user only when they are accessing through one > server? And are you certain that all 30 servers are running 1..2.8, and > configured the same? > > -- > > Best regards, > > CharlesSorry should have been more clear, all 30 systems are completely separate instances, a good example of inter-departmental non-cooperation ;). They are all running version 1.2.8 and are all configured identically minus compile differences for RHEL4 or 5 and x86 vs. x86_64. So the user is always accessing the same server. It is extremely odd that this issue is happening only to this one user on this one system which still make me lean toward it being a client issue but I am unsure of how to determine that. -Erinn
On Mon, 2009-12-07 at 08:29 -0700, Erinn Looney-Triggs wrote:> I am trying to track down an mbox corruption issue that has been > happening for one of our users. It appears that the mbox file is getting > truncated and the all important From line no longer exists which then > leads to the mbox being useless. It also appears, as in most > truncations, that parts of the mbox are being lost.Can you give me some specific examples of what the corruption looks like? For example you could put the mbox through http://dovecot.org/tools/mbox-anonymize.pl> I am not sure how to > debug this to find out whether this is a client or server issue and any > pointers would be appreciated.Client can't corrupt mailbox data no matter what they do. But it's possible that the user/client is doing something a bit differently than everyone else and that causes it. What OS and filesystem do you use? It's a local filesystem and not NFS or something? Does Dovecot log any errors (about anything)? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20091207/60d6d260/attachment-0002.bin>
On 12/07/2009 11:41 AM, Timo Sirainen wrote:> On Mon, 2009-12-07 at 08:29 -0700, Erinn Looney-Triggs wrote: >> I am trying to track down an mbox corruption issue that has been >> happening for one of our users. It appears that the mbox file is getting >> truncated and the all important From line no longer exists which then >> leads to the mbox being useless. It also appears, as in most >> truncations, that parts of the mbox are being lost. > > Can you give me some specific examples of what the corruption looks > like? For example you could put the mbox through > http://dovecot.org/tools/mbox-anonymize.pl > >> I am not sure how to >> debug this to find out whether this is a client or server issue and any >> pointers would be appreciated. > > Client can't corrupt mailbox data no matter what they do. But it's > possible that the user/client is doing something a bit differently than > everyone else and that causes it. > > What OS and filesystem do you use? It's a local filesystem and not NFS > or something? Does Dovecot log any errors (about anything)?I can dig up a copy of a malformed mbox and I will get it to you. As for the other details This is a RHEL5 x64 system running an ext3 file system, nothing fancy with NFS. Dovecot does not appear to log any errors, I looked back and there was one error logged a couple of days before the issue occurred but I am not sure if it is related: Dec 2 11:19:07 dovecot: IMAP(XXXX): Next message unexpectedly lost from mbox file /home/XXXX/mail/Sent at 2102537 (cached) Dec 2 11:19:07 dovecot: IMAP(XXXX): read(mail, uid=501) failed: Invalid argument Dec 2 11:19:07 dovecot: IMAP(XXXX): Disconnected: Internal error occurred. Refer to server log for more information. [2009-12-02 11:19:07] bytes=177474/43685174 But as I said this occurred a couple of days before the corruption issue, or at least before the user became aware of the corruption, though this user is connected all day every day. Anyway I will dig around for a copy of the corrupted mbox files. Thanks, -Erinn
On 12/07/2009 12:18 PM, Charles Marcus wrote:> On 12/7/2009, Erinn Looney-Triggs (erinn.looneytriggs at gmail.com) wrote: >> I looked back and there was one error logged a couple of days before >> the issue occurred but I am not sure if it is related: >> >> Dec 2 11:19:07 dovecot: IMAP(XXXX): Next message unexpectedly lost >> from mbox file /home/XXXX/mail/Sent at 2102537 (cached) > > And just to confirm - /home/XXXX is the user with the problem? > > ;)Yep, privacy requirements, even for user IDs. :) -Erinn