Maybe I was a bit unclear: I have about 1000 error messages per day from
random accounts (about 500 in total so far) on all clusters. These are
transparent to the user, so it's more like background noise at the moment.
No VM involved. All machines are baremetal DRBD two-node clusters.
As far as I see it I can not nail it down to specific accounts, POP3 vs.
IMAP, LMTP delivery vs. IMAP store or Sieve vs. non-Sieve etc.
Tim
Am 02.02.23 um 17:55 schrieb Christopher Wensink:> Can you isolate the problem account on a separate VM to see if the
> problem follows the account or the original vm?
>
> Chris
>
> On 2/2/2023 9:58 AM, Tim Evers wrote:
>> Good point - these are 8 diferrent DRBD clusters. I failed over one
>> testing this theory. Problem persists.
>>
>> So I would rule out underlying issues.
>>
>> Especially since the "wrong" value is suspiciously often the
on-disk
>> size rather than a random value one would expect if there is
>> corruption underneath.
>>
>> Tim
>>
>> Am 02.02.23 um 16:43 schrieb Christopher Wensink:
>>> Something to try, this all could be happening because of underlying
>>> disk failure on the array it is running on.? If this is a VM, can
>>> you move the operation to another host or data store to rule out
>>> hardware issues?
>>>
>>> On 2/2/2023 9:19 AM, Stuart Henderson wrote:
>>>> On 2023-02-01, Tim Evers <te-ml-ext at artfiles.de>
wrote:
>>>>> I run a fairly large Dovecot Installation (around 100k
mailboxes) on
>>>>> several servers.
>>>>>
>>>>> gzip compression is on.
>>>>>
>>>>> Every once in a while I get the dreaded "cache
corruption"
>>>>> messages in
>>>>> the log:
>>>>>
>>>>> Error: Corrupted record in index cache file
>>>>> /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken
physical
>>>>> size
>>>>> in mailbox INBOX:
>>>>>
read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,))
>>>>>
>>>>> failed: Cached message size smaller than expected (2877
< 8099,
>>>>> box=INBOX, UID=3868)
>>>>>
>>>>> Error: Corrupted record in index cache file
>>>>> /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken
physical
>>>>> size
>>>>> in mailbox INBOX:
>>>>>
read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,))
>>>>>
>>>>> failed: Cached message size smaller than expected (5533
< 8192,
>>>>> box=INBOX, UID=3875)
>>>>>
>>>>> The first entry shows 2877 (size on disk) vs. 8099 (real
size
>>>>> unzipped,
>>>>> also in the filename: S=8099).
>>>>>
>>>>> The second entry shows 5533 (size on disk) vs. 8192 - this
is not
>>>>> correct in any way. Size on disk is 13907 as noted in the
filename.
>>>>>
>>>>> Both mails were delivered trough LMTP and retrieved by the
POP3
>>>>> service.
>>>>>
>>>>> Anyone with an idea what might be happening here? I've
read all
>>>>> available info in the doc and in the previous discussions /
bug
>>>>> reports,
>>>>> but nothing seems to match my case. And where does that
8192 come
>>>>> from -
>>>>> it looks suspicious?
>>>>>
>>>>> Version is 2.3.7.2 (Ubuntu 20.04)
>>>> 2.3.7.2 is rather old now. There were definitely fixes
regarding
>>>> compression
>>>> around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all
the
>>>> details
>>>> but it took a release or two before some remaining issues were
>>>> sorted out
>>>> after changes in the area). I'd be looking to get it
updated to a
>>>> current
>>>> version first.
>>>>
>>>>
>>>>
>>>
>