On Thu, 18 Mar 2021, Plutocrat wrote:> I've been looking around for a solution to this problem. I want to prune down > the attachments on a server before a migration. Some of the emails are 7 > years old and have 40Mb attachments, so this seems like a good opportunity to > rationalize things. So perhaps I'd like to "Remove all attachments from > emails older than 2 years, in the .Sent directory", or "Attachments over 10Mb > anywhere in the mail tree" > > I've found the strip_attachments.pl script here > <https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl> which > works fine on mbox (as tested on my local Thunderbird mboxes), but not on > maildir which is on the dovecot server. My Perl isn't strong enough to > re-purpose it.It you have anything that works on mbox, it will probably work on Maildir as each file can be considered a single message mbox. You can combine the script with find ~user/MailDir -type f ... -exec /path/to/mbox-strip {} \; The ... can be replaced with more file tests (like minimum size or age or only within */cur/) to cut down on processing. I wrote a gawk script to slim down a multi-Gb Outlook mbox for a user, but it wasn't really complicated, just matching for /^Content-Transfer-Encoding:.*base64/i header (virtually all bulky data will be encoded this way), buffering the base64 data part, then outputting it if it was small, or deleting/replacing/extracting it otherwise. It was a one-off discarded tool but I can hunt for it if you're hard up.> I've looked at ripmime and mpack/munpack, and although they seem like useful > tools to do the job of deconstructing the mail into its constituent parts, it > doesn't seem to help in re-building the email. I think they could be used > with a bit of study into mail MIME structure, and used with a helper script. > > So before I take a deep dive into scripting my own solution, I just wanted to > check if anyone else on the list has been through this and has some resources > or pointers they can share, or maybe even someone to tell me "Duh, you can do > it with doveadm of course".MIMEDefang may help. Joseph Tam <jtam.home at gmail.com>
On 19/03/2021 07.31, Joseph Tam wrote:>> I've found the strip_attachments.pl script here <https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl> which works fine on mbox (as tested on my local Thunderbird mboxes), but not on maildir which is on the dovecot server. My Perl isn't strong enough to re-purpose it.> > It you have anything that works on mbox, it will probably work on Maildir > as each file can be considered a single message mbox. You can combine > the script with > > find ~user/MailDir -type f ... -exec /path/to/mbox-strip {} \;I thought that too, but my initial test on a single message file didn't work like that. I think I got a zero length file. I'll dig into the code to see if I can figure it out, although my Perl hasn't been used for 20 years or so ...> The ... can be replaced with more file tests (like minimum size or age oronly within */cur/) to cut down on processing. Sure. I'm quite handy with find, sed, awk and all that bash malarkey. I was actually wondering if it could be done with those alone, but it would make more sense to use a library which understands mime already, and does the heavy lifting. This approach might be good as a last resort.> MIMEDefang may help.Nice. Thanks for the pointer. P. On Fri, Mar 19, 2021 at 7:31 AM Joseph Tam <jtam.home at gmail.com> wrote:> On Thu, 18 Mar 2021, Plutocrat wrote: > > > I've been looking around for a solution to this problem. I want to prune > down > > the attachments on a server before a migration. Some of the emails are 7 > > years old and have 40Mb attachments, so this seems like a good > opportunity to > > rationalize things. So perhaps I'd like to "Remove all attachments from > > emails older than 2 years, in the .Sent directory", or "Attachments over > 10Mb > > anywhere in the mail tree" > > > > I've found the strip_attachments.pl script here > > <https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl> > which > > works fine on mbox (as tested on my local Thunderbird mboxes), but not > on > > maildir which is on the dovecot server. My Perl isn't strong enough to > > re-purpose it. > > It you have anything that works on mbox, it will probably work on Maildir > as each file can be considered a single message mbox. You can combine > the script with > > find ~user/MailDir -type f ... -exec /path/to/mbox-strip {} \; > > The ... can be replaced with more file tests (like minimum size or age > or only within */cur/) to cut down on processing. > > I wrote a gawk script to slim down a multi-Gb Outlook mbox > for a user, but it wasn't really complicated, just matching for > /^Content-Transfer-Encoding:.*base64/i header (virtually all bulky data > will be encoded this way), buffering the base64 data part, then outputting > it if it was small, or deleting/replacing/extracting it otherwise. > > It was a one-off discarded tool but I can hunt for it if you're hard up. > > > I've looked at ripmime and mpack/munpack, and although they seem like > useful > > tools to do the job of deconstructing the mail into its constituent > parts, it > > doesn't seem to help in re-building the email. I think they could be > used > > with a bit of study into mail MIME structure, and used with a helper > script. > > > > So before I take a deep dive into scripting my own solution, I just > wanted to > > check if anyone else on the list has been through this and has some > resources > > or pointers they can share, or maybe even someone to tell me "Duh, you > can do > > it with doveadm of course". > > MIMEDefang may help. > > Joseph Tam <jtam.home at gmail.com> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20210320/4f0d73bf/attachment.html>
Still can't find the magic solution to this. - My PERL isn't good enough to re-purpose strip-attachments.pl so it works on individual emails. - ripmime works to extract attachments only - altermime looked good and would delete all attachments from a directory of emails. However it messed up the structure somehow so they wouldn't display in an email client (Thunderbird, Roundcube). - mimeDEFANG looked possible, but couldn't figure out how to use that as a standalone script. - PHP solutions including the promising https://github.com/php-mime-mail-parser/php-mime-mail-parser seem only to be able to save attachments from the email, not delete it. I'll keep going I guess. I can't believe I'm the only person in the world to want to do this though ... P. On 19/03/2021 07.31, Joseph Tam wrote:> On Thu, 18 Mar 2021, Plutocrat wrote: > >> I've been looking around for a solution to this problem. I want to prune down the attachments on a server before a migration. Some of the emails are 7 years old and have 40Mb attachments, so this seems like a good opportunity to rationalize things. So perhaps I'd like to "Remove all attachments from emails older than 2 years, in the .Sent directory", or "Attachments over 10Mb anywhere in the mail tree" >> >> I've found the strip_attachments.pl script here <https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl> which works fine on mbox (as tested on my local Thunderbird mboxes), but not on maildir which is on the dovecot server. My Perl isn't strong enough to re-purpose it. > > It you have anything that works on mbox, it will probably work on Maildir > as each file can be considered a single message mbox.? You can combine > the script with > > ????find ~user/MailDir -type f ... -exec /path/to/mbox-strip {} \; > > The ... can be replaced with more file tests (like minimum size or age > or only within */cur/) to cut down on processing. > > I wrote a gawk script to slim down a multi-Gb Outlook mbox > for a user, but it wasn't really complicated, just matching for > /^Content-Transfer-Encoding:.*base64/i header (virtually all bulky data > will be encoded this way), buffering the base64 data part, then outputting > it if it was small, or deleting/replacing/extracting it otherwise. > > It was a one-off discarded tool but I can hunt for it if you're hard up. > >> I've looked at ripmime and mpack/munpack, and although they seem like useful tools to do the job of deconstructing the mail into its constituent parts, it doesn't seem to help in re-building the email. I think they could be used with a bit of study into mail MIME structure, and used with a helper script. >> >> So before I take a deep dive into scripting my own solution, I just wanted to check if anyone else on the list has been through this and has some resources or pointers they can share, or maybe even someone to tell me "Duh, you can do it with doveadm of course". > > MIMEDefang may help. > > Joseph Tam <jtam.home at gmail.com>
justina colmena ~biz
2021-Apr-01 02:33 UTC
Mass Stripping Attachments by Directory, Age, Size
Well ain't that rich? To use an allegory of sorts, we're going to have start using staples rather than paperclips ??? with our email attachments, and one unified digital signature on the whole message as sent rather than a separate signature for each enclosure as commonly "done" with PGP, GnuPG, etc. On March 30, 2021 7:39:02 PM AKDT, Plutocrat <plutocrat at gmail.com> wrote:>Still can't find the magic solution to this. > >- My PERL isn't good enough to re-purpose strip-attachments.pl so it >works on individual emails. >- ripmime works to extract attachments only >- altermime looked good and would delete all attachments from a >directory of emails. However it messed up the structure somehow so they >wouldn't display in an email client (Thunderbird, Roundcube). >- mimeDEFANG looked possible, but couldn't figure out how to use that >as a standalone script. >- PHP solutions including the promising >https://github.com/php-mime-mail-parser/php-mime-mail-parser seem only >to be able to save attachments from the email, not delete it. > >I'll keep going I guess. I can't believe I'm the only person in the >world to want to do this though ... > >P. > >On 19/03/2021 07.31, Joseph Tam wrote: >> On Thu, 18 Mar 2021, Plutocrat wrote: >> >>> I've been looking around for a solution to this problem. I want to >prune down the attachments on a server before a migration. Some of the >emails are 7 years old and have 40Mb attachments, so this seems like a >good opportunity to rationalize things. So perhaps I'd like to "Remove >all attachments from emails older than 2 years, in the .Sent >directory", or "Attachments over 10Mb anywhere in the mail tree" >>> >>> I've found the strip_attachments.pl script here ><https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl> >which works fine on mbox (as tested on my local Thunderbird mboxes), >but not on maildir which is on the dovecot server. My Perl isn't strong >enough to re-purpose it. >> >> It you have anything that works on mbox, it will probably work on >Maildir >> as each file can be considered a single message mbox.? You can >combine >> the script with >> >> ????find ~user/MailDir -type f ... -exec /path/to/mbox-strip {} \; >> >> The ... can be replaced with more file tests (like minimum size or >age >> or only within */cur/) to cut down on processing. >> >> I wrote a gawk script to slim down a multi-Gb Outlook mbox >> for a user, but it wasn't really complicated, just matching for >> /^Content-Transfer-Encoding:.*base64/i header (virtually all bulky >data >> will be encoded this way), buffering the base64 data part, then >outputting >> it if it was small, or deleting/replacing/extracting it otherwise. >> >> It was a one-off discarded tool but I can hunt for it if you're hard >up. >> >>> I've looked at ripmime and mpack/munpack, and although they seem >like useful tools to do the job of deconstructing the mail into its >constituent parts, it doesn't seem to help in re-building the email. I >think they could be used with a bit of study into mail MIME structure, >and used with a helper script. >>> >>> So before I take a deep dive into scripting my own solution, I just >wanted to check if anyone else on the list has been through this and >has some resources or pointers they can share, or maybe even someone to >tell me "Duh, you can do it with doveadm of course". >> >> MIMEDefang may help. >> >> Joseph Tam <jtam.home at gmail.com>-- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20210331/7402deaf/attachment.html>