I've been planning on adding these for years. Maybe it's about time
soon. I guess they could be added already to v2.2, but enabled only by a new
setting because it requires file format changes that old Dovecots can't then
read. I could probably patch v2.1 also so it is able to at least read the new
format without failing. For v2.3 this new format could then be made the default.
And what would the checksums be exactly? Would the standard CRC32 and CRC8 work
fine, or are there any better ones?
1. dovecot.index
v2.1+ always only fully recreates this file, never overwrites data to it. So the
checksums could be written only when the dovecot.index is being recreated. There
are 3 possible things to checksum:
- header (32bit checksum)
- all of the mail records (32bit checksum)
- each mail record independently (8bit checksum per mail)
The header's checksum could be verified every time the index is opened. The
full mail record checksum could be verified when something appears to be wrong,
but it's probably a waste of time to check it in normal operation.
I'm not really sure about the per-mail checksums. It would be easy to create
them while dovecot.index is being created, but after reading the file into
memory the records are updated in many ways in many places. It's probably
not worth the complexity and extra slowness to verify and/or update the
checksums in all the different places. So is it worth it to even have them? In
error conditions when fixing up indexes it could be useful to skip over records
with broken checksums (and check if the mail is in dovecot.index.backup with
correct checksum). Maybe that's enough to be worth 1 byte per message?..
2. dovecot.index.log
This file is only appended to. Each committed transaction could be prefixed in
the new format with <transaction size><transaction 32bit checksum>.
With the new format this wouldn't actually increase the log file size much,
because there is already some space wasted for a compatibility
"boundary" record that could be removed now.
3. dovecot.index.cache
Cache file is the most complex file. Its headers get overwritten once in a
while. Probably not worth the trouble to checksum the header itself, and
there's not a lot that could be done even if a broken checksum was found.
But each mail_cache_record could have its own checksum. A 8bit checksum could be
added without increasing the file's size. Maybe that would be enough?
4. dovecot.index.thread
This is a rather simple file and a 32bit checksum could be added to its header,
and verified every time the file is read (because it's fully read anyway).
5. dovecot.mailbox.log
This file doesn't even have a header. There are 3 unused bytes in each
record currently. One of them could be used for a new "flags"
parameter, with the only flag being "checksum added". There would
still be space left for 8bit or 16bit checksum.
6. Other files
There are also some text files, like dovecot-acl, subscriptions, quota usage and
Sieve scripts. They probably have to be without checksums for now.