Love Dovecot, many thanks for it! But, I have a question regarding the storing of emails, with respect to efficiency... Our business model (advertising industry) is such that our users exchange a lot of emails with attachments - most less than a megabyte, but some considerably larger. Consequently, I have been looking for a good, open source imap server that doesn't store multiple copies of the same attachment - but instead, stores a checksum, and whenever a message is stored with a duplicate attachment, the attachment is stored only once, and simply referenced by some kind of link to other emails. This would *drastically* reduce the storage requirements for our company - imagine a message with a 10MB attachment, sent to 40 of our users, sometimes more than once. Now multiply this by 3 times per day, for 5 years... Are there any plans for Dovecot to support this type of storage in the future? Does this require the use of an SQL DB for storing the message components? -- Best regards, Charles
On Thu, 2006-06-01 at 07:45 -0400, Charles Marcus wrote:> Our business model (advertising industry) is such that our users > exchange a lot of emails with attachments - most less than a megabyte, > but some considerably larger. Consequently, I have been looking for a > good, open source imap server that doesn't store multiple copies of the > same attachment - but instead, stores a checksum, and whenever a message > is stored with a duplicate attachment, the attachment is stored only > once, and simply referenced by some kind of link to other emails. > > This would *drastically* reduce the storage requirements for our company > - imagine a message with a 10MB attachment, sent to 40 of our users, > sometimes more than once. Now multiply this by 3 times per day, for 5 > years... > > Are there any plans for Dovecot to support this type of storage in the > future? Does this require the use of an SQL DB for storing the message > components?This is planned for dbox format in maybe a couple of months. I think the plan was to do this in deliver agent so that the delivered mail's attachment is shared between the mail's recipients. I'm not sure if you're suggesting that checksum should be taken from the attachment and it be used to see if it already happens to exist, and if so use it. Actually I'm not sure if that was also what I was supposed to do anyway. :) I think that could anyway be a good idea, but how about hash collisions? I could just ignore that since they would practically never happen. Hash + attachment size would be even safer. The only truly safe way would be to read the whole attachment from disk and compare it byte-by-byte, but that'd just slow it down unneededly.. Perhaps it should be an option. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://dovecot.org/pipermail/dovecot/attachments/20060601/c5543084/attachment.pgp
(06.06.01 kl.07:45) Charles Marcus skrev f?ljande till dovecot@dovecot.org:> Love Dovecot, many thanks for it! > > But, I have a question regarding the storing of emails, with respect to > efficiency... > > Our business model (advertising industry) is such that our users exchange a > lot of emails with attachments - most less than a megabyte, but some > considerably larger. Consequently, I have been looking for a good, open > source imap server that doesn't store multiple copies of the same attachment > - but instead, stores a checksum, and whenever a message is stored with a > duplicate attachment, the attachment is stored only once, and simply > referenced by some kind of link to other emails. > > This would *drastically* reduce the storage requirements for our company - > imagine a message with a 10MB attachment, sent to 40 of our users, sometimes > more than once. Now multiply this by 3 times per day, for 5 years... > > Are there any plans for Dovecot to support this type of storage in the > future? Does this require the use of an SQL DB for storing the message > components? >I thought I'd just mention aradis which can extract and replace the attachment. The attachment itself can then be delivered to a webserver, the filesystem or a custom script. http://robur.slu.se/jensl/aradis (I am the author of aradis and we use it mainly for mailinglists.:-) Cheers, Jens> -- > > Best regards, > > Charles >----------------------------------------------------------------------- 'In theory, there is no difference between theory and practice. But, in practice, there is.' ----------------------------------------------------------------------- Jens L??s Email: jens.laas@data.slu.se Department of Computer Services, SLU Phone: +46 18 67 35 15 Vindbrov?gen 1 P.O. Box 7079 S-750 07 Uppsala SWEDEN -----------------------------------------------------------------------