Something to try, this all could be happening because of underlying disk failure on the array it is running on.? If this is a VM, can you move the operation to another host or data store to rule out hardware issues? On 2/2/2023 9:19 AM, Stuart Henderson wrote:> On 2023-02-01, Tim Evers <te-ml-ext at artfiles.de> wrote: >> I run a fairly large Dovecot Installation (around 100k mailboxes) on >> several servers. >> >> gzip compression is on. >> >> Every once in a while I get the dreaded "cache corruption" messages in >> the log: >> >> Error: Corrupted record in index cache file >> /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size >> in mailbox INBOX: >> read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) >> failed: Cached message size smaller than expected (2877 < 8099, >> box=INBOX, UID=3868) >> >> Error: Corrupted record in index cache file >> /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size >> in mailbox INBOX: >> read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) >> failed: Cached message size smaller than expected (5533 < 8192, >> box=INBOX, UID=3875) >> >> The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, >> also in the filename: S=8099). >> >> The second entry shows 5533 (size on disk) vs. 8192 - this is not >> correct in any way. Size on disk is 13907 as noted in the filename. >> >> Both mails were delivered trough LMTP and retrieved by the POP3 service. >> >> Anyone with an idea what might be happening here? I've read all >> available info in the doc and in the previous discussions / bug reports, >> but nothing seems to match my case. And where does that 8192 come from - >> it looks suspicious? >> >> Version is 2.3.7.2 (Ubuntu 20.04) > 2.3.7.2 is rather old now. There were definitely fixes regarding compression > around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details > but it took a release or two before some remaining issues were sorted out > after changes in the area). I'd be looking to get it updated to a current > version first. > > >-- Christopher Wensink IS Administrator Five Star Plastics, Inc 1339 Continental Drive Eau Claire, WI 54701 Office: 715-831-1682 Mobile: 715-563-3112 Fax: 715-831-6075 cwensink at five-star-plastics.com www.five-star-plastics.com
Could even be memory. I had once on an office machine a faulty memory module (without ecc), and it caused the md5sum from files on truecrypt usb backup drives to change constantly. Removed the module, and no more issues.> > Something to try, this all could be happening because of underlying disk > failure on the array it is running on.? If this is a VM, can you move > the operation to another host or data store to rule out hardware issues? > > On 2/2/2023 9:19 AM, Stuart Henderson wrote: > > On 2023-02-01, Tim Evers <te-ml-ext at artfiles.de> wrote: > >> I run a fairly large Dovecot Installation (around 100k mailboxes) on > >> several servers. > >> > >> gzip compression is on. > >> > >> Every once in a while I get the dreaded "cache corruption" messages in > >> the log: > >> > >> Error: Corrupted record in index cache file > >> /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size > >> in mailbox INBOX: > >> > read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276 > :2,)) > >> failed: Cached message size smaller than expected (2877 < 8099, > >> box=INBOX, UID=3868) > >> > >> Error: Corrupted record in index cache file > >> /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size > >> in mailbox INBOX: > >> > read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=141 > 21:2,)) > >> failed: Cached message size smaller than expected (5533 < 8192, > >> box=INBOX, UID=3875) > >> > >> The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, > >> also in the filename: S=8099). > >> > >> The second entry shows 5533 (size on disk) vs. 8192 - this is not > >> correct in any way. Size on disk is 13907 as noted in the filename. > >> > >> Both mails were delivered trough LMTP and retrieved by the POP3 service. > >> > >> Anyone with an idea what might be happening here? I've read all > >> available info in the doc and in the previous discussions / bug reports, > >> but nothing seems to match my case. And where does that 8192 come from - > >> it looks suspicious? > >> > >> Version is 2.3.7.2 (Ubuntu 20.04) > > 2.3.7.2 is rather old now. There were definitely fixes regarding compression > > around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details > > but it took a release or two before some remaining issues were sorted out > > after changes in the area). I'd be looking to get it updated to a current > > version first. > >
Good point - these are 8 diferrent DRBD clusters. I failed over one testing this theory. Problem persists. So I would rule out underlying issues. Especially since the "wrong" value is suspiciously often the on-disk size rather than a random value one would expect if there is corruption underneath. Tim Am 02.02.23 um 16:43 schrieb Christopher Wensink:> Something to try, this all could be happening because of underlying > disk failure on the array it is running on.? If this is a VM, can you > move the operation to another host or data store to rule out hardware > issues? > > On 2/2/2023 9:19 AM, Stuart Henderson wrote: >> On 2023-02-01, Tim Evers <te-ml-ext at artfiles.de> wrote: >>> I run a fairly large Dovecot Installation (around 100k mailboxes) on >>> several servers. >>> >>> gzip compression is on. >>> >>> Every once in a while I get the dreaded "cache corruption" messages in >>> the log: >>> >>> Error: Corrupted record in index cache file >>> /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size >>> in mailbox INBOX: >>> read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) >>> >>> failed: Cached message size smaller than expected (2877 < 8099, >>> box=INBOX, UID=3868) >>> >>> Error: Corrupted record in index cache file >>> /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size >>> in mailbox INBOX: >>> read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) >>> >>> failed: Cached message size smaller than expected (5533 < 8192, >>> box=INBOX, UID=3875) >>> >>> The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, >>> also in the filename: S=8099). >>> >>> The second entry shows 5533 (size on disk) vs. 8192 - this is not >>> correct in any way. Size on disk is 13907 as noted in the filename. >>> >>> Both mails were delivered trough LMTP and retrieved by the POP3 >>> service. >>> >>> Anyone with an idea what might be happening here? I've read all >>> available info in the doc and in the previous discussions / bug >>> reports, >>> but nothing seems to match my case. And where does that 8192 come >>> from - >>> it looks suspicious? >>> >>> Version is 2.3.7.2 (Ubuntu 20.04) >> 2.3.7.2 is rather old now. There were definitely fixes regarding >> compression >> around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the >> details >> but it took a release or two before some remaining issues were sorted >> out >> after changes in the area). I'd be looking to get it updated to a >> current >> version first. >> >> >> >