Hi, use "doveadm" to get all real message doveadm -f table fetch -A "size.physical" ALL | awk '{s+=$2}END{printf("%.2fMB\n", s/1024/1024);}' 189247.67MB .. 185G use "du" to get size on disc: In my case with deduplication: /srv/stroage/# du -s -h * 53G vmail 75G vmail_sis without deduplication /srv/stroage/# du -s -h -l * 53G vmail 209G vmail_sis j4i, SIS can't use the zlib plugin so the 75G in my case are not compressed (I haven't a filesystem that I trust and has a compression feature). Anyway it has a 3:1 ratio in my case. Maybe I interpret the SIS wrong and SIS couldn't be counted with du -l (count links). But if someone doesn't have SIS this values should be point you into the right direction. bye Harald Am 16.03.2016 um 08:50 schrieb G?tz Reinicke - IT Koordinator:> Am 15.03.16 um 16:01 schrieb G?tz Reinicke - IT Koordinator: >> Hi, >> >> may be someone has already done that: Do you have a script(?) tool which >> shows the efficiency of the mail compression if zlib is used? >> >> Something that shows the uncompressed size vrs. the compressed. > > Hi, > > maybe my question was a bit misleading. But anyway thanks for your > feedback regarding your experiences and compression rates. > > We already thought about the benefit of less IO and more CPU power, > which is no concern. > > The mailboxes I checked also go with 40-60% compression rate. > > But what I was looking for was a tool or way to see what volume would be > used if we where not using compression. > > e.g. "du -hs --without-zlib" > > Our management would like to see a graph one day which shows the volume > uncompressed and compressed ... > > Adding zlib with mdbox or maildir - as we do it currently - is from my > POV if you have the CPU power a MUST :) > > happy dovecoting - G?tz > > >-- Harald Leithner ITronic Wiedner Hauptstra?e 120/5.1, 1050 Wien, Austria Tel: +43-1-545 0 604 Mobil: +43-699-123 78 4 78 Mail: leithner at itronic.at | itronic.at
Not sure how you?re seeing such a high ratio; I tried the same commands on my system (thanks for these btw) and my savings from compression are around 5% =D That said I?m dealing with a much smaller volume (3gb) and I?ve only identified a half dozen or so attachments that don?t have some kind of compression already; most modern mail programs will compress common types like images by default, and many modern file-formats have compression built in, and can give better results than zlib anyway. My biggest savings are on mailing list messages (I filter these into their own mailbox) since they tend to be longer than typical messages, especially with auto-quoting, they also tend to be very busy mailboxes, but I also don?t keep them forever. As an experiment I also tried moving my (uncompressed) messages to a compressing file-system (ZFS using lz4) but the savings were similarly small; I assume they were probably a bit better, but the extra overhead of the file-system eroded it since the savings are so small in my case. I think if you?re serious about compression then a compressing file-system is the way to go though, but in my case I?m on virtual hosting so there?s not much point in layering a ZFS volume on top of shared storage (since it?s ZFS based already for integrity/redundancy). I just thought I?d mention my experience since people are quoting big savings that I haven?t seen; I wouldn?t consider my usage all that unusual, maybe some of you are receiving a lot more newsletter type traffic (these messages can be quite large), uncompressed document type files, or are less selective in which messages are retained forever? Just a caution that people looking at compression may not see the same savings depending upon their actual content. Spam is another bad category for compression I?ve found; at least in my case the messages are usually very short, and/or contain randomised junk to try to confound filters, though I?m pretty aggressive about clearing them (I discard messages outright above a certain threshold, and use a script to expunge spam messages so that I can expunge messages with higher spam ratings faster (so possible false positives stick around longer so they can be caught).> On 16 Mar 2016, at 09:48, Harald Leithner <leithner at itronic.at> wrote: > > Hi, > > use "doveadm" to get all real message > > doveadm -f table fetch -A "size.physical" ALL | awk '{s+=$2}END{printf("%.2fMB\n", s/1024/1024);}' > > 189247.67MB .. 185G > > use "du" to get size on disc: > > In my case > with deduplication: > > /srv/stroage/# du -s -h * > 53G vmail > 75G vmail_sis > > without deduplication > > /srv/stroage/# du -s -h -l * > 53G vmail > 209G vmail_sis > > j4i, SIS can't use the zlib plugin so the 75G in my case are not compressed (I haven't a filesystem that I trust and has a compression feature). Anyway it has a 3:1 ratio in my case. > > Maybe I interpret the SIS wrong and SIS couldn't be counted with du -l (count links). > > But if someone doesn't have SIS this values should be point you into the right direction. > > bye > > Harald > > Am 16.03.2016 um 08:50 schrieb G?tz Reinicke - IT Koordinator: >> Am 15.03.16 um 16:01 schrieb G?tz Reinicke - IT Koordinator: >>> Hi, >>> >>> may be someone has already done that: Do you have a script(?) tool which >>> shows the efficiency of the mail compression if zlib is used? >>> >>> Something that shows the uncompressed size vrs. the compressed. >> >> Hi, >> >> maybe my question was a bit misleading. But anyway thanks for your >> feedback regarding your experiences and compression rates. >> >> We already thought about the benefit of less IO and more CPU power, >> which is no concern. >> >> The mailboxes I checked also go with 40-60% compression rate. >> >> But what I was looking for was a tool or way to see what volume would be >> used if we where not using compression. >> >> e.g. "du -hs --without-zlib" >> >> Our management would like to see a graph one day which shows the volume >> uncompressed and compressed ... >> >> Adding zlib with mdbox or maildir - as we do it currently - is from my >> POV if you have the CPU power a MUST :) >> >> happy dovecoting - G?tz >> >> >> > > -- > Harald Leithner > > ITronic > Wiedner Hauptstra?e 120/5.1, 1050 Wien, Austria > Tel: +43-1-545 0 604 > Mobil: +43-699-123 78 4 78 > Mail: leithner at itronic.at | itronic.at
In the vmail directory are only attachment stored which are smaller then 64k every attachment that is bigger get into the SIS store. The SIS store has no compression but it seams that attachments are stored in raw and not base64 encoded so its saves 30%? on binary data. Also I wrote that 'du -l' maybe not the correct way to count de-duplication. It seams that every attachment has minimum 2 hardlinks in the SIS, I missed that before I wrote the other mail. That also explains why storage uses so much more space then the counted mail size ;-) I think ignoring the hashes folder in the sis would give better results: find vmail_sis -type f -printf '%s %p\n' | grep -v hashes | awk '{s+=$1}END{printf("%.2fMB\n", s/1024/1024);}' In my case this is: 142922.29MB (So forget 209G from my previous mail.) doveadm -f table fetch -A "size.physical" ALL | awk '{s+=$2}END{printf("%.2fMB\n", s/1024/1024);}' 195861.12MB du -sh vmail 56G (it also seams that mdbox tricked me with spare file size) Mails in mdbox storage compressed without index/logs find vmail -type f -printf '%s %p\n' | grep "/storage/m." | awk '{s+=$1}END{printf("%.2fMB\n", s/1024/1024);}' 4776.51MB index/logs find vmail -type f -printf '%s %p\n' | grep -v "/storage/m." | awk '{s+=$1}END{printf("%.2fMB\n", s/1024/1024);}' 224.40MB So in the end I use 146,7 Storage + 224,4 index/logs/metadata/overhead with 191,27 GB Plain E-Mails I still can't tell you how much compression begins in because SIS is not compressed ;-) So some without SIS and mdbox have to do this test. bye Am 16.03.2016 um 11:52 schrieb Haravikk:> Not sure how you?re seeing such a high ratio; I tried the same commands on my system (thanks for these btw) and my savings from compression are around 5% =D > > That said I?m dealing with a much smaller volume (3gb) and I?ve only identified a half dozen or so attachments that don?t have some kind of compression already; most modern mail programs will compress common types like images by default, and many modern file-formats have compression built in, and can give better results than zlib anyway. > > My biggest savings are on mailing list messages (I filter these into their own mailbox) since they tend to be longer than typical messages, especially with auto-quoting, they also tend to be very busy mailboxes, but I also don?t keep them forever. > > As an experiment I also tried moving my (uncompressed) messages to a compressing file-system (ZFS using lz4) but the savings were similarly small; I assume they were probably a bit better, but the extra overhead of the file-system eroded it since the savings are so small in my case. I think if you?re serious about compression then a compressing file-system is the way to go though, but in my case I?m on virtual hosting so there?s not much point in layering a ZFS volume on top of shared storage (since it?s ZFS based already for integrity/redundancy). > > I just thought I?d mention my experience since people are quoting big savings that I haven?t seen; I wouldn?t consider my usage all that unusual, maybe some of you are receiving a lot more newsletter type traffic (these messages can be quite large), uncompressed document type files, or are less selective in which messages are retained forever? Just a caution that people looking at compression may not see the same savings depending upon their actual content. > > Spam is another bad category for compression I?ve found; at least in my case the messages are usually very short, and/or contain randomised junk to try to confound filters, though I?m pretty aggressive about clearing them (I discard messages outright above a certain threshold, and use a script to expunge spam messages so that I can expunge messages with higher spam ratings faster (so possible false positives stick around longer so they can be caught). > >> On 16 Mar 2016, at 09:48, Harald Leithner <leithner at itronic.at> wrote: >> >> Hi, >> >> use "doveadm" to get all real message >> >> doveadm -f table fetch -A "size.physical" ALL | awk '{s+=$2}END{printf("%.2fMB\n", s/1024/1024);}' >> >> 189247.67MB .. 185G >> >> use "du" to get size on disc: >> >> In my case >> with deduplication: >> >> /srv/stroage/# du -s -h * >> 53G vmail >> 75G vmail_sis >> >> without deduplication >> >> /srv/stroage/# du -s -h -l * >> 53G vmail >> 209G vmail_sis >> >> j4i, SIS can't use the zlib plugin so the 75G in my case are not compressed (I haven't a filesystem that I trust and has a compression feature). Anyway it has a 3:1 ratio in my case. >> >> Maybe I interpret the SIS wrong and SIS couldn't be counted with du -l (count links). >> >> But if someone doesn't have SIS this values should be point you into the right direction. >> >> bye >> >> Harald >> >> Am 16.03.2016 um 08:50 schrieb G?tz Reinicke - IT Koordinator: >>> Am 15.03.16 um 16:01 schrieb G?tz Reinicke - IT Koordinator: >>>> Hi, >>>> >>>> may be someone has already done that: Do you have a script(?) tool which >>>> shows the efficiency of the mail compression if zlib is used? >>>> >>>> Something that shows the uncompressed size vrs. the compressed. >>> >>> Hi, >>> >>> maybe my question was a bit misleading. But anyway thanks for your >>> feedback regarding your experiences and compression rates. >>> >>> We already thought about the benefit of less IO and more CPU power, >>> which is no concern. >>> >>> The mailboxes I checked also go with 40-60% compression rate. >>> >>> But what I was looking for was a tool or way to see what volume would be >>> used if we where not using compression. >>> >>> e.g. "du -hs --without-zlib" >>> >>> Our management would like to see a graph one day which shows the volume >>> uncompressed and compressed ... >>> >>> Adding zlib with mdbox or maildir - as we do it currently - is from my >>> POV if you have the CPU power a MUST :) >>> >>> happy dovecoting - G?tz >>> >>> >>> >> >> -- >> Harald Leithner >> >> ITronic >> Wiedner Hauptstra?e 120/5.1, 1050 Wien, Austria >> Tel: +43-1-545 0 604 >> Mobil: +43-699-123 78 4 78 >> Mail: leithner at itronic.at | itronic.at-- Harald Leithner ITronic Wiedner Hauptstra?e 120/5.1, 1050 Wien, Austria Tel: +43-1-545 0 604 Mobil: +43-699-123 78 4 78 Mail: leithner at itronic.at | itronic.at