Reindl Harald
2014-Jun-10 13:24 UTC
OT - Finding/removing duplicate emails - WAS: Re: [Dovecot] dovecot/lmtp munmap()-ing a lot
Am 10.06.2014 15:17, schrieb Steffen Kaiser:> On Tue, 10 Jun 2014, Charles Marcus wrote: >> On 6/9/2014 5:44 PM, Ralf Hildebrandt <r at sys4.de> wrote: >>> That's probably the problem here. The user had LOTS of (duplicate!) >>> mails in his inbox. > >> Anyone ever found a reliable way to do this? > >> It sure would be nice if dovecot could perform this on a per account and/or per maildir/mailbox case with a >> simple doveadm command... > > The basic question is: what is a duplicate? > > I spot 100% duplicates within the same Maildir mailbox with a script similiar to "fdupes" > http://linux.die.net/man/1/fdupes . > Because an user may copy messages around, I scan one mailbox at a time. > > For some rare cases, where I merge two accounts, I use a script, that looks for the message id in one account and > removes all messages with the same id in the other account. Than I merge the Maildirs. > > However, neither script I would call general enough for automatic processingdbmail has just "suppress_duplicates = yes" and silently ignores *new received* messages with the same message-id to the same user as a global setting that's fine for people not able to handling a mailing-list and hit reply-all every time -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 246 bytes Desc: OpenPGP digital signature URL: <http://dovecot.org/pipermail/dovecot/attachments/20140610/ab6d61c4/attachment.sig>
Steffen Kaiser
2014-Jun-10 13:39 UTC
OT - Finding/removing duplicate emails - WAS: Re: [Dovecot] dovecot/lmtp munmap()-ing a lot
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 10 Jun 2014, Reindl Harald wrote:> Am 10.06.2014 15:17, schrieb Steffen Kaiser: >> On Tue, 10 Jun 2014, Charles Marcus wrote: >>> On 6/9/2014 5:44 PM, Ralf Hildebrandt <r at sys4.de> wrote: >>>> That's probably the problem here. The user had LOTS of (duplicate!) >>>> mails in his inbox. >> >>> Anyone ever found a reliable way to do this? >> >>> It sure would be nice if dovecot could perform this on a per account and/or per maildir/mailbox case with a >>> simple doveadm command... >> >> The basic question is: what is a duplicate? >> >> I spot 100% duplicates within the same Maildir mailbox with a script similiar to "fdupes" >> http://linux.die.net/man/1/fdupes . >> Because an user may copy messages around, I scan one mailbox at a time. >> >> For some rare cases, where I merge two accounts, I use a script, that looks for the message id in one account and >> removes all messages with the same id in the other account. Than I merge the Maildirs. >> >> However, neither script I would call general enough for automatic processing > > dbmail has just "suppress_duplicates = yes" and silently ignores > *new received* messages with the same message-id to the same user > as a global settingWasn't there a thread some days/weeks ago, that Pigeonhole behaves the same by default and the poster asked how long the timeframe is Pigeonhole remembers the ids? Actually, I still wonder about whether or not the same message-id is sufficient to decide to "silently drop" a message, as I interprete "to ignore a message" as "to drop". They might came different paths, some MUA might not generate ids unqiue world-wide or time-depended, ... . It's a matter of taste, IMHO. - -- Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEVAwUBU5cKf3z1H7kL/d9rAQIFXQf/eOVNj6OCbpbrvgvj1dUmQ4eqZuISO80A oMsncG65sYwOWZAepapdWQCxSK/+kEYmWm7nhmqC+ZfJebsEM+VRaL++gesNXlCZ Uo1VuQKgyEF0Y+buDvOSHxwn8+Fum3u6kiMkvf9Jiog+ucVwlOAsOvPrTfxdT9ST udBzpSjfE9JLWhptjKdqS/1Hum5I3UJN6nb0g2ZYTB1rVdQxmTfmnoRiMb5UeTRA aUpFBQULANbHFJiaVfnUXoYIU1cUl9iaywDSeNG34bmfXJlgGWfpMy1Ani5XdsR6 f7cnIGSdsNmthfdS3SHvY86TfYSf2qUMEJUi4k3QMjDlttWAATqvkA==mlVS -----END PGP SIGNATURE-----