On Mon, Oct 15, 2018 at 08:50:21AM +0000, Raymond Sellars
wrote:>
> Hi
>
> Looking for some insight into mdbox index file management and recovery from
corruptions.
>
>
> I have a two node cluster on NFS with proxy director in front for user
stickness. One node (a nominated master) bidirectionally replicates to a 3rd
node on a DR site.
>
> We periodically get index file corruptions resulting in rebuilds. However
the user experience is poor as messages read/deleted from months/years ago all
reappear as unread again.
>
> We've seen corruption because of NFS NTP time sync problems, proxy not
being stick, but also the DR node being off line for a while and then tripping
corruption within production when it comes back on.
>
> Error message example (1 of):
>
> Error: Corrupted dbox file /mailshare/.. (removed) ../home/mail/storage/m.4
(around offset=993548): EOF reading msg header (got 0/30 bytes)
> https://wiki2.dovecot.org/MailboxFormat/dbox- i've read up on all the
documentation I can find and understand "
> you must not lose the dbox index files, they can't be regenerated
without data loss."
>
> Questions:
> #1 Any additional tips for avoiding mdbox index corruptions with dsync? Or
should I revert to maildir format? I like the performance premise of the mdbox
but these index corruptions are a reliability issue.
>
> #2 I'm guessing read status is one of the meta data items lost. But its
seems it can't recover it from dovecot.index.backup files either. Any
technique to preserve that item as its key to the user experience?
>
> #3 If index/transaction logs are so critical is there some kind of check
point backups I can take? Native dovecot feature or do I need to script
something.
>
> #4 I've noticed that rebuilding the index does not work if the
dovecot.index.log file is lost (deleted as a hard test). The dovecot.index.cache
can be but once the log file i gone messages are not automatically (or manually
that i can find) recovered from the storage directory.
>
> I've not seen any dovecot.index.log file corruptions but that file
seems very high risk. If rebuilding the index only from the log file or a
combination process from storage directory?
>
> Is there perhaps an option to just use the transaction log and not the
index? Although that doesn't sound wise for performance.
>
> #5 In additional to status UNREAD we also notice files moved to the trash
reappear. Is that expected behavior?
>
> Thanks
> Raymond
>
>
>
What version of dovecot and what OS are you running? Is NFS
linux/bsd/netapp/etc?