Hi all,
just wanted to mention that the backup process described below seems to
work. The 100 files gap is still about the same and I further
investigated the cause. It is related to the meta information like
indices and caches that are present in some but not all folders.
Counting only files that contain the sequence ,S= and even summing all
file sizes led to the same number and the exactly same size of raw mail
data.
I also didn't receive any notification about really failed backups,
therefore I believe that the backup works correctly.
Regards
Christian
On 09.01.2022 21:57, Christian wrote:> Hi all,
>
> first: I'm using version 2.3.4.1
>
> I manage some rather large imap mailboxes which I want to backup on a
> regular basis. Some of them have relatively heavy traffic and one of
> them is greater than 30GB in size.
>
> I studied the docs for doveadm backup
> (https://wiki2.dovecot.org/Tools/Doveadm/Sync) and even did some code
> research to better understand the process.
>
> The docs state that using stateful synchronization is the most
> efficient way to synchronize mailboxes, therefore I chose this approach.
>
> Highlevel overview:
>
> - store a copy of the whole maildir in a separate directory
> (/var/vmail/backup)
> - backup to this directory once a minute (trying to make most use of
> transaction logs) using the last state stored within a file
> - create a backup once a day using tar (full, differential and
> incremental ones) blocking the backup process of the before mentioned
> step
>
> I quite often receive notifications that doveadm backup returned an
> exit code of 2, which should be quite normal. These notifications look
> like that:
>
> dsync(another_address at my.domain): Warning: Failed to do incremental
> sync for mailbox INBOX, retry with a full sync (Modseq 171631 no
> longer in transaction log (highest=177818, last_common_uid=177308,
> nextuid=177309))
> dsync(another_address at my.domain): Warning: Mailbox changes caused a
> desync. You may want to run dsync again: Remote lost mailbox GUID
> e9149d0ae4e02d532505000026ca4352 (maybe it was just deleted?)
> Synced another_address at my.domain successfully but missing some
> changes. Took 3 seconds. Starting retry 1...
>
>
> The first message seems to point out that the transaction log got
> rolled and no more contains the messages from the backup dir, right? I
> thought about setting mail_index_log_rotate_min_age to 1hour to
> prevent rolling transaction logs too often, but abandoned this thought
> and increased the backup interval to once a minute. The warnings still
> appear so maybe my thoughts about transactions logs are wrong. The
> second message seems less alarming to me.
>
> How does doeveadm backup behave in such situations? Does it directly
> fall back to a less efficient way of syncing mails? Does the state
> store the information "retry with a full sync" and the next run
uses
> this mode? To investigate on this I simply measured runtimes an saw
> that the second/retry run takes a bit longer (up to about 15 seconds)
> to sync the dir.
>
> I'm afraid of losing messages using my approach. Is it safe to always
> use doveadm backup -s $state? Simply counting one maildirs files
> within the live directory and the backup copy shows a 100 fewer files
> within the backup dir although the script runs only since a few days.
>
> For reference, see my backup script below.
>
>
> Regards
>
> Christian
>
>
> #!/bin/bash
>
> # * * * * * /root/bin/backup.sh --sync-only
> # 12 2 1-7 * * test $(date +\%u) -eq 6 && /root/bin/backup.sh
--full
> # 12 2 8-31 * * test $(date +\%u) -eq 6 && /root/bin/backup.sh
> --differential
> # 12 2 * * * test $(date +\%u) -ne 6 && /root/bin/backup.sh
>
> synconly=0
> differential=0
> fullbackup=0
> if [ $# -gt 0 ] ; then
> ? if [ "$1" == "--sync-only" ] ; then
> ??? synconly=1
> ? elif [ "$1" == "--differential" ] ; then
> ??? differential=1
> ? elif [ "$1" == "--full" ] ; then
> ??? fullbackup=1
> ? fi
> fi
>
> basedir="/var/vmail/backup"
> targetdir="/var/vmail/backup/done"
> mailaddresses="one_address at my.domain another_address at my.domain
> yet_another at my.domain"
>
> if [ ! -d "$basedir" ] ; then
> ? mkdir -p "$basedir"
> ? chown vmail:vmail "$basedir"
> fi
> if [ ! -d "$targetdir" ] ; then
> ? mkdir -p "$targetdir"
> ? chown vmail:vmail "$targetdir"
> fi
>
> for mailaddr in ${mailaddresses} ; do
> ? #echo "Creating backup for $mailaddr."
>
> ? domainpart=${mailaddr#*@}
> ? localpart=${mailaddr%%@*}
> ? lockfile="$basedir/$mailaddr.lock"
> ? statefile="$basedir/$mailaddr.state"
> ? backupdir="$domainpart/$localpart/Maildir"
> ? snapshotfile_full="$basedir/$mailaddr.full.snar"
> ? snapshotfile="$basedir/$mailaddr.snar"
> ? backup_basename="$basedir/${mailaddr}_$(date
'+%Y%m%d_%H%M%S')"
>
> ? (
> ??? if [ $synconly -eq 1 ] ; then
> ????? flock -xn 200
> ????? if [ $? -eq 1 ] ; then
> ??????? # failed to acquire lock. Skip mailbox silently.
> ??????? exit
> ????? fi
> ??? fi
>
> ??? # try to acquire exclusive lock for one minute
> ??? flock -xw 60 200
> ??? if [ $? -eq 1 ] ; then
> ????? echo "Failed to acquire write lock within 60 seconds. Skipping
> $mailaddr."
> ????? exit
> ??? fi
>
> ??? retries=0
> ??? retval=1
>
> ??? until [ $retval -eq 0 ] || [ $retries -ge 3 ] ; do
> ????? let 'retries++'
> ????? if [ -f "$statefile" ] ; then
> ??????? oldstate=$(head -1 "$statefile")
> ????? else
> ??????? oldstate=""
> ????? fi
> ????? start_time=$(date +%s)
> ????? ERROR=$((doveadm backup -u "$mailaddr" -s
"$oldstate"
> "maildir:$basedir/$backupdir") 2>&1 >
"$statefile")
> ????? retval=$?
> ????? end_time=$(date +%s)
> ????? let 'duration=end_time-start_time'
> ????? if [ $retval -eq 2 ] ; then
> ??????? #if [ $retries -gt 1 ] ; then
> ????????? echo "$ERROR"
> ????????? echo "Synced $mailaddr successfully but missing some
> changes. Took $duration seconds. Starting retry $retries..."
> ??????? #fi
> ????? elif [ $retval -ne 0 ] ; then
> ??????? echo "$ERROR"
> ??????? echo "Syncing $mailaddr failed. Return code $retval. Took
> $duration seconds. Removing backup directory and starting retry
> $retries..."
> ??????? rm -rf "$basedir/$backupdir"
> ??????? rm -f "$statefile" "$snapshotfile"
> ????? elif [ $retries -gt 1 ] ; then
> ??????? echo "Successful sync took $duration seconds."
> ????? fi
> ??? done
>
> ??? # downgrade lock to shared lock
> ??? flock -sn 200
> ??? [ $synconly -eq 1 ] && exit
>
> ??? if [ $retval -ne 0 ] ; then
> ????? echo "Too many retries. Aborting backup of $mailaddr."
> ????? exit
> ??? fi
>
>
> ??? cd "$basedir"
> ??? if [ $fullbackup -eq 1 ] || [ ! -f "$snapshotfile_full" ] ;
then
> ???? tar -cpzf "${backup_basename}_full.tar.gz" --level=0 -g
> "$snapshotfile_full" "$backupdir"
> ???? cp -f "$snapshotfile_full" "$snapshotfile"
> ??? else
> ???? suffix=""
> ???? if [ $differential -eq 1 ] ; then
> ?????? cp -f "$snapshotfile_full" "$snapshotfile"
> ?????? suffix="_diff"
> ???? fi
>
> ???? tar -cpzf "${backup_basename}${suffix}.tar.gz" -g
"$snapshotfile"
> "$backupdir"
> ??? fi
> ??? cd - > /dev/null
> ??? mv "${basedir}/"*.tar.gz "$targetdir"
> ? ) 200>"$lockfile"
>
> ? [ $synconly -eq 1 ] && continue
> ? # housekeeping
> ? newest_full=$(ls -1 "${targetdir}/${mailaddr}_"*_full.tar.gz
> 2>/dev/null | sort | tail -1)
> ? if [ -n "$newest_full" ] ; then
> ??? #echo "Cleaning up files older than $newest_full..."
> ??? find "$targetdir" -depth -maxdepth 1 -name
"${mailaddr}_*" !
> -newer "$newest_full" ! -samefile "$newest_full"
-printf 'Deleting
> %p...\n' -delete
> ? fi
> done
>