I am running Dovecot IMAP on Linux, on a LizardFS storage cluster with
Maildir storage. This has worked well for most of the accounts for
several months.
However in the last couple of weeks we are seeing increasing errors
regarding corrupted index files. Some of the accounts affected are
unable to retrieve messages due to timeouts.
It appeared the problems were due to the accounts being accessed from
multiple servers simultaneously, so I forced them all to access one
server, but the errors remained. It looks like it has something to do
with file locking, but LizardFS supports advisory file locking and I do
have it enabled.
Deleting the corrupted indexes fixes the problem for a while, but it
eventually returns, particularly for some accounts.
Here are some errors I'm seeing (just a random grab). Actual home
directories are munged for confidentiality.
imap[25157]: (clientes.standby) Error: Failed to fix view for
HOME/clientes:standby/dovecot.index: Missing middle file seq=1 (between 1..1, we
have seqs 8): File is already open
imap[5565]: (stadiumchair) Error: Transaction log file
HOME/stadiumchair/.Drafts/dovecot.index.log: marked corrupted
imap[5005]: (stadiumchair) Error: Corrupted transaction log file
HOME/stadiumchair/.Drafts/dovecot.index.log seq 2: indexid changed 1418941056
-> 1500658549 (sync_offset=0)
imap[20243]: (martha) Error: Transaction log HOME/martha/dovecot.index.log:
duplicate transaction log sequence (539)
imap[4665]: (emsspam) Error: Index file HOME/emsspam/dovecot.index: indexid
changed: 1500658479 -> 1297175382
imap[4665]: (emsspam) Error: Corrupted transaction log file
HOME/emsspam/dovecot.index.log seq 3: indexid changed: 1500658479 ->
1297175382 (sync_offset=316)
imap[22985]: (emsspam) Error: Corrupted transaction log file
HOME/emsspam/dovecot.index.log seq 10742: Invalid transaction log size (9296 vs
9296): HOME/emsspam/dovecot.index.log (sync_offset=9296)
imap[3267]: (emsspam) Error: Failed to map view for HOME/emsspam/dovecot.index:
Failed to map file seq=10742 offset=9052..18446744073709551615 (ret=0):
corrupted, indexid=0
imap[3267]: (emsspam) Error: HOME/emsspam/dovecot.index view is inconsistent:
uid=3062271 inserted in the middle of mailbox
The output of dovecot -n is pasted in below. Note that some of the boxes
are running 4.9, some running 4.4, all have the same problems. Also note
that I am using a custom authentication front end for our virtual
mailboxes, but it just sets up the minimal environment variables and
runs imap.
Is there anything I can change to eliminate these problems? Are there
any other diagnostics I can provide to shed light on this?
# 2.2.31 (65cde28): /etc/dovecot/dovecot.conf
# OS: Linux 4.4.66 x86_64 Gentoo Base System release 2.3 
log_path = /dev/stderr
mail_debug = yes
mail_fsync = always
mail_location = maildir:~/.maildir
mail_log_prefix = "%s[%p]: (%u) "
mmap_disable = yes
namespace inbox {
  inbox = yes
  location = 
  mailbox Drafts {
    special_use = \Drafts
  }
  mailbox Junk {
    special_use = \Junk
  }
  mailbox Sent {
    special_use = \Sent
  }
  mailbox "Sent Messages" {
    special_use = \Sent
  }
  mailbox Trash {
    special_use = \Trash
  }
  prefix = INBOX
  separator = 
  type = private
}
passdb {
  args = *
  driver = pam
}
passdb {
  args = /etc/dovecot/dovecot-sql.conf.ext
  driver = sql
}
plugin {
  mail_log_events = delete undelete expunge copy mailbox_delete mailbox_rename
}
ssl_cert = </etc/ssl/dovecot/server.pem
ssl_key =  # hidden, use -P to show it
userdb {
  driver = passwd
}
userdb {
  args = /etc/dovecot/dovecot-sql.conf.ext
  driver = sql
}
-- 
Bruce Guenter <bruce at untroubled.org>               
http://untroubled.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL:
<http://dovecot.org/pipermail/dovecot/attachments/20170721/ba6f53ed/attachment.sig>
Am 21.07.2017 um 19:47 schrieb Bruce Guenter:> > I am running Dovecot IMAP on Linux, on a LizardFS storage cluster with > Maildir storage. This has worked well for most of the accounts for > several months. > > However in the last couple of weeks we are seeing increasing errors > regarding corrupted index files.you should avoid this one solution is to use loadbalancers with persistance and/or with i.e https://wiki2.dovecot.org/Director i dont know LizardFS but problems are somekind equal with all storage clusters and there are different solutions to handle this so i dont know what may the best at your place i would read and ask here for settings with storage clusters, a good start could be https://wiki2.dovecot.org/NFS https://wiki2.dovecot.org/SharedMailboxes/ClusterSetup https://wiki2.dovecot.org/MailLocation/SharedDisk Some of the accounts affected are> unable to retrieve messages due to timeouts.index settings and mailbox format has impact about this maildir mostly is self healing but that may fail sometimes on cluster> > It appeared the problems were due to the accounts being accessed from > multiple servers simultaneously, so I forced them all to access one > server, but the errors remained. It looks like it has something to do > with file locking, but LizardFS supports advisory file locking and I do > have it enabled. > > Deleting the corrupted indexes fixes the problem for a while, but it > eventually returns, particularly for some accounts.yeah that is perhaps per design> > Here are some errors I'm seeing (just a random grab). Actual home > directories are munged for confidentiality. > > imap[25157]: (clientes.standby) Error: Failed to fix view for HOME/clientes:standby/dovecot.index: Missing middle file seq=1 (between 1..1, we have seqs 8): File is already open > imap[5565]: (stadiumchair) Error: Transaction log file HOME/stadiumchair/.Drafts/dovecot.index.log: marked corrupted > imap[5005]: (stadiumchair) Error: Corrupted transaction log file HOME/stadiumchair/.Drafts/dovecot.index.log seq 2: indexid changed 1418941056 -> 1500658549 (sync_offset=0) > imap[20243]: (martha) Error: Transaction log HOME/martha/dovecot.index.log: duplicate transaction log sequence (539) > imap[4665]: (emsspam) Error: Index file HOME/emsspam/dovecot.index: indexid changed: 1500658479 -> 1297175382 > imap[4665]: (emsspam) Error: Corrupted transaction log file HOME/emsspam/dovecot.index.log seq 3: indexid changed: 1500658479 -> 1297175382 (sync_offset=316) > imap[22985]: (emsspam) Error: Corrupted transaction log file HOME/emsspam/dovecot.index.log seq 10742: Invalid transaction log size (9296 vs 9296): HOME/emsspam/dovecot.index.log (sync_offset=9296) > imap[3267]: (emsspam) Error: Failed to map view for HOME/emsspam/dovecot.index: Failed to map file seq=10742 offset=9052..18446744073709551615 (ret=0): corrupted, indexid=0 > imap[3267]: (emsspam) Error: HOME/emsspam/dovecot.index view is inconsistent: uid=3062271 inserted in the middle of mailbox > > The output of dovecot -n is pasted in below. Note that some of the boxes > are running 4.9, some running 4.4, all have the same problems. Also note > that I am using a custom authentication front end for our virtual > mailboxes, but it just sets up the minimal environment variables and > runs imap. > > Is there anything I can change to eliminate these problems? Are there > any other diagnostics I can provide to shed light on this? > > # 2.2.31 (65cde28): /etc/dovecot/dovecot.conf > # OS: Linux 4.4.66 x86_64 Gentoo Base System release 2.3 > log_path = /dev/stderr > mail_debug = yes > mail_fsync = always > mail_location = maildir:~/.maildir > mail_log_prefix = "%s[%p]: (%u) " > mmap_disable = yes > namespace inbox { > inbox = yes > location = > mailbox Drafts { > special_use = \Drafts > } > mailbox Junk { > special_use = \Junk > } > mailbox Sent { > special_use = \Sent > } > mailbox "Sent Messages" { > special_use = \Sent > } > mailbox Trash { > special_use = \Trash > } > prefix = INBOX > separator = > type = private > } > passdb { > args = * > driver = pam > } > passdb { > args = /etc/dovecot/dovecot-sql.conf.ext > driver = sql > } > plugin { > mail_log_events = delete undelete expunge copy mailbox_delete mailbox_rename > } > ssl_cert = </etc/ssl/dovecot/server.pem > ssl_key = # hidden, use -P to show it > userdb { > driver = passwd > } > userdb { > args = /etc/dovecot/dovecot-sql.conf.ext > driver = sql > } >i think you could rare the corrupt with optimize settings to i.e mail_fsync = always mail_nfs_storage = yes mail_nfs_index = yes mmap_disable = yes etc but to fix it at all you may have to rethink your whole setup dovecot gurus may help and search the list archive about cluster setups Best Regards MfG Robert Schetterer -- [*] sys4 AG http://sys4.de, +49 (89) 30 90 46 64 Schlei?heimer Stra?e 26/MG, 80333 M?nchen Sitz der Gesellschaft: M?nchen, Amtsgericht M?nchen: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
On Fri, Jul 21, 2017 at 08:50:16PM +0200, Robert Schetterer wrote:> you should avoid this > one solution is to use loadbalancers with persistanceWe had been using a loadbalancer with persistence to reduce the problems, and today I switched to everything running on a single box to avoid any cross-node contention. Unfortunately, the problem still happens, even when they were all running imap on a single box. We are moving to a director type setup instead of a persistent load balancer to eliminate the last source of cross-node access.> i think you could rare the corrupt > with optimize settings > to i.e > > mail_fsync = always > mmap_disable = yesI have those, but...> mail_nfs_storage = yes > mail_nfs_index = yesI missed seeing those. Thanks -- Bruce Guenter <bruce at untroubled.org> http://untroubled.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Digital signature URL: <http://dovecot.org/pipermail/dovecot/attachments/20170721/c85d3bd2/attachment.sig>
On 21.07.2017 20:47, Bruce Guenter wrote:> I am running Dovecot IMAP on Linux, on a LizardFS storage cluster with > Maildir storage. This has worked well for most of the accounts for > several months. > > However in the last couple of weeks we are seeing increasing errors > regarding corrupted index files. Some of the accounts affected are > unable to retrieve messages due to timeouts. > > It appeared the problems were due to the accounts being accessed from > multiple servers simultaneously, so I forced them all to access one > server, but the errors remained. It looks like it has something to do > with file locking, but LizardFS supports advisory file locking and I do > have it enabled. > > Deleting the corrupted indexes fixes the problem for a while, but it > eventually returns, particularly for some accounts. > > Here are some errors I'm seeing (just a random grab). Actual home > directories are munged for confidentiality. > > imap[25157]: (clientes.standby) Error: Failed to fix view for HOME/clientes:standby/dovecot.index: Missing middle file seq=1 (between 1..1, we have seqs 8): File is already open > imap[5565]: (stadiumchair) Error: Transaction log file HOME/stadiumchair/.Drafts/dovecot.index.log: marked corrupted > imap[5005]: (stadiumchair) Error: Corrupted transaction log file HOME/stadiumchair/.Drafts/dovecot.index.log seq 2: indexid changed 1418941056 -> 1500658549 (sync_offset=0) > imap[20243]: (martha) Error: Transaction log HOME/martha/dovecot.index.log: duplicate transaction log sequence (539) > imap[4665]: (emsspam) Error: Index file HOME/emsspam/dovecot.index: indexid changed: 1500658479 -> 1297175382 > imap[4665]: (emsspam) Error: Corrupted transaction log file HOME/emsspam/dovecot.index.log seq 3: indexid changed: 1500658479 -> 1297175382 (sync_offset=316) > imap[22985]: (emsspam) Error: Corrupted transaction log file HOME/emsspam/dovecot.index.log seq 10742: Invalid transaction log size (9296 vs 9296): HOME/emsspam/dovecot.index.log (sync_offset=9296) > imap[3267]: (emsspam) Error: Failed to map view for HOME/emsspam/dovecot.index: Failed to map file seq=10742 offset=9052..18446744073709551615 (ret=0): corrupted, indexid=0 > imap[3267]: (emsspam) Error: HOME/emsspam/dovecot.index view is inconsistent: uid=3062271 inserted in the middle of mailbox > > The output of dovecot -n is pasted in below. Note that some of the boxes > are running 4.9, some running 4.4, all have the same problems. Also note > that I am using a custom authentication front end for our virtual > mailboxes, but it just sets up the minimal environment variables and > runs imap. > > Is there anything I can change to eliminate these problems? Are there > any other diagnostics I can provide to shed light on this? > > # 2.2.31 (65cde28): /etc/dovecot/dovecot.conf > # OS: Linux 4.4.66 x86_64 Gentoo Base System release 2.3 > log_path = /dev/stderr > mail_debug = yes > mail_fsync = always > mail_location = maildir:~/.maildir > mail_log_prefix = "%s[%p]: (%u) " > mmap_disable = yes > namespace inbox { > inbox = yes > location = > mailbox Drafts { > special_use = \Drafts > } > mailbox Junk { > special_use = \Junk > } > mailbox Sent { > special_use = \Sent > } > mailbox "Sent Messages" { > special_use = \Sent > } > mailbox Trash { > special_use = \Trash > } > prefix = INBOX > separator = > type = private > } > passdb { > args = * > driver = pam > } > passdb { > args = /etc/dovecot/dovecot-sql.conf.ext > driver = sql > } > plugin { > mail_log_events = delete undelete expunge copy mailbox_delete mailbox_rename > } > ssl_cert = </etc/ssl/dovecot/server.pem > ssl_key = # hidden, use -P to show it > userdb { > driver = passwd > } > userdb { > args = /etc/dovecot/dovecot-sql.conf.ext > driver = sql > } >Do you have users accessing the files concurrently from more than one dovecot instance at a time? Aki
On Mon, Jul 24, 2017 at 08:39:36AM +0300, Aki Tuomi wrote:> Do you have users accessing the files concurrently from more than one > dovecot instance at a time?Yes. Apparently it is fairly common behavior for some IMAP clients to open up multiple connections to the same mailbox. Some times the multiple accesses came from different servers (stand alone IMAP client and a webmail system), but there is corruption even when all the accesses are going through the same server. (Yes, we need a director. I am working on integrating that into our network.) -- Bruce Guenter <bruce at untroubled.org> http://untroubled.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Digital signature URL: <http://dovecot.org/pipermail/dovecot/attachments/20170724/7b07318f/attachment.sig>