On 3/9/2016 9:02 PM, Timo Sirainen <tss at iki.fi> wrote:> On 08 Mar 2016, at 01:50, Pavel Stano <stanojr at websupport.sk> wrote: >> >> sis attachment deduplication is broken in 2.2.16 upwards. >> It is caused by this commit. >> https://github.com/dovecot/core/commit/664bf3e236c214aee86294483c379e4fa66c2e63 >> >> in src/lib-fs/fs-sis.c function fs_sis_try_link() is comparation of >> inodes of hash files. >> Because fs_stat() after that commit use fstat() on open fd of temporary >> file instead of stat on filename. But that temporary file has differnt >> inode. >> >> It not cause any corruption but it will not save any space. >> Because every duplicate attachment will be in separate file. > Thanks, fixed: https://github.com/dovecot/core/commit/3b39022ea0513363241cf852b7d454c841584ea1So, after the fix is applied, does dovecot silently delete the duplicated files, or is there a command that needs to be run manually?
> On 11 Mar 2016, at 02:37, Charles Marcus <CMarcus at Media-Brokers.com> wrote: > > On 3/9/2016 9:02 PM, Timo Sirainen <tss at iki.fi> wrote: >> On 08 Mar 2016, at 01:50, Pavel Stano <stanojr at websupport.sk> wrote: >>> >>> sis attachment deduplication is broken in 2.2.16 upwards. >>> It is caused by this commit. >>> https://github.com/dovecot/core/commit/664bf3e236c214aee86294483c379e4fa66c2e63 >>> >>> in src/lib-fs/fs-sis.c function fs_sis_try_link() is comparation of >>> inodes of hash files. >>> Because fs_stat() after that commit use fstat() on open fd of temporary >>> file instead of stat on filename. But that temporary file has differnt >>> inode. >>> >>> It not cause any corruption but it will not save any space. >>> Because every duplicate attachment will be in separate file. >> Thanks, fixed: https://github.com/dovecot/core/commit/3b39022ea0513363241cf852b7d454c841584ea1 > > So, after the fix is applied, does dovecot silently delete the > duplicated files, or is there a command that needs to be run manually?You'd have to do it manually in some way. A script that does something like: Go through all attachment directories and for each file: - Sort files by filename - Identify that files A and B the same (beginning of the filename begins with same hash), but have a different inode - ln A B.tmp && mv B.tmp B
Am 11.03.2016 um 01:56 schrieb Timo Sirainen:> >> On 11 Mar 2016, at 02:37, Charles Marcus <CMarcus at Media-Brokers.com> wrote: >> >> On 3/9/2016 9:02 PM, Timo Sirainen <tss at iki.fi> wrote: >>> On 08 Mar 2016, at 01:50, Pavel Stano <stanojr at websupport.sk> wrote: >>>> >>>> sis attachment deduplication is broken in 2.2.16 upwards. >>>> It is caused by this commit. >>>> https://github.com/dovecot/core/commit/664bf3e236c214aee86294483c379e4fa66c2e63 >>>> >>>> in src/lib-fs/fs-sis.c function fs_sis_try_link() is comparation of >>>> inodes of hash files. >>>> Because fs_stat() after that commit use fstat() on open fd of temporary >>>> file instead of stat on filename. But that temporary file has differnt >>>> inode. >>>> >>>> It not cause any corruption but it will not save any space. >>>> Because every duplicate attachment will be in separate file. >>> Thanks, fixed: https://github.com/dovecot/core/commit/3b39022ea0513363241cf852b7d454c841584ea1 >> >> So, after the fix is applied, does dovecot silently delete the >> duplicated files, or is there a command that needs to be run manually? > > You'd have to do it manually in some way. A script that does something like: > > Go through all attachment directories and for each file: > - Sort files by filename > - Identify that files A and B the same (beginning of the filename begins with same hash), but have a different inode > - ln A B.tmp && mv B.tmp B >This functionality is how it works in sis-queue correct? Wouldn't it be nice to adopted doveadm sis deduplicate to handle this? regards -- Harald Leithner ITronic Wiedner Hauptstra?e 120/5.1, 1050 Wien, Austria Tel: +43-1-545 0 604 Mobil: +43-699-123 78 4 78 Mail: leithner at itronic.at | itronic.at
On 3/10/2016 7:56 PM, Timo Sirainen <tss at iki.fi> wrote:>> On 11 Mar 2016, at 02:37, Charles Marcus <CMarcus at Media-Brokers.com> wrote: >> >> On 3/9/2016 9:02 PM, Timo Sirainen <tss at iki.fi> wrote: >>> On 08 Mar 2016, at 01:50, Pavel Stano <stanojr at websupport.sk> wrote: >>>> sis attachment deduplication is broken in 2.2.16 upwards. >>>> It is caused by this commit. >>>> https://github.com/dovecot/core/commit/664bf3e236c214aee86294483c379e4fa66c2e63 >>>> >>>> in src/lib-fs/fs-sis.c function fs_sis_try_link() is comparation of >>>> inodes of hash files. >>>> Because fs_stat() after that commit use fstat() on open fd of temporary >>>> file instead of stat on filename. But that temporary file has differnt >>>> inode. >>>> >>>> It not cause any corruption but it will not save any space. >>>> Because every duplicate attachment will be in separate file. >>> Thanks, fixed: https://github.com/dovecot/core/commit/3b39022ea0513363241cf852b7d454c841584ea1 >> So, after the fix is applied, does dovecot silently delete the >> duplicated files, or is there a command that needs to be run manually? > You'd have to do it manually in some way. A script that does something like: > > Go through all attachment directories and for each file: > - Sort files by filename > - Identify that files A and B the same (beginning of the filename begins with same hash), but have a different inode > - ln A B.tmp && mv B.tmp BUgh... ok thanks, but it seems like that would be much safer as a doveadm command...
On 11.03.16 3:56, Timo Sirainen wrote:> >> On 11 Mar 2016, at 02:37, Charles Marcus <CMarcus at Media-Brokers.com> wrote: >> >> On 3/9/2016 9:02 PM, Timo Sirainen <tss at iki.fi> wrote: >>> On 08 Mar 2016, at 01:50, Pavel Stano <stanojr at websupport.sk> wrote: >>>> >>>> sis attachment deduplication is broken in 2.2.16 upwards. >>>> It is caused by this commit. >>>> https://github.com/dovecot/core/commit/664bf3e236c214aee86294483c379e4fa66c2e63 >>>> >>>> in src/lib-fs/fs-sis.c function fs_sis_try_link() is comparation of >>>> inodes of hash files. >>>> Because fs_stat() after that commit use fstat() on open fd of temporary >>>> file instead of stat on filename. But that temporary file has differnt >>>> inode. >>>> >>>> It not cause any corruption but it will not save any space. >>>> Because every duplicate attachment will be in separate file. >>> Thanks, fixed: https://github.com/dovecot/core/commit/3b39022ea0513363241cf852b7d454c841584ea1 >> >> So, after the fix is applied, does dovecot silently delete the >> duplicated files, or is there a command that needs to be run manually? > > You'd have to do it manually in some way. A script that does something like: > > Go through all attachment directories and for each file: > - Sort files by filename > - Identify that files A and B the same (beginning of the filename begins with same hash), but have a different inode > - ln A B.tmp && mv B.tmp B >I've also found that many of /hashes/ directories have missed. # ll /tank1/vmail/attachments/1f/1f total 3300 -rw------- 1 vmail vmail 403976 12 ??? 00:20 1f1f504c582600a2af94b39c088692aba714fe72-c53b9e1508b14356797d0100d09efc50 -rw------- 1 vmail vmail 403976 12 ??? 00:20 1f1f504c582600a2af94b39c088692aba714fe72-c93b9e1508b14356797d0100d09efc50 -rw------- 1 vmail vmail 403976 12 ??? 00:20 1f1f504c582600a2af94b39c088692aba714fe72-f2a777181eb14356807d0100d09efc50 -rw------- 1 vmail vmail 403976 12 ??? 00:20 1f1f504c582600a2af94b39c088692aba714fe72-f31a5e2917b143567e7d0100d09efc50 -rw------- 1 vmail vmail 2582016 3 ??? 00:20 1f1f97880e8cddc2dfe3c4ad2654b9da937226b7-94c53d358bd33756d6140000d09efc50 Is it related to the same bug or there is another issue? Is it safe to delete attachment files if there is no file with the same hash in the /hashes/ directory or there is no /hashes/ directory at all?
On 11.03.2016 3:56, Timo Sirainen wrote:>> So, after the fix is applied, does dovecot silently delete the >> duplicated files, or is there a command that needs to be run manually? > > You'd have to do it manually in some way. A script that does something like: > > Go through all attachment directories and for each file: > - Sort files by filename > - Identify that files A and B the same (beginning of the filename begins with same hash), but have a different inode > - ln A B.tmp && mv B.tmp B >The problem turned out to be a bit more complicated than that. Finally a came up with that script: https://github.com/moisseev/doveadm-tools/blob/master/bin/dsisck It assumes Dovecot should not run.