Zenaan Harkness
2019-Sep-12 06:57 UTC
Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)
I am wondering why sieve-filter is so slow compared to gnu sieve. I run mpop (like getmail) to download from a pop3 server to a local mbox file: ~/mail/email-incoming-unsorted This step is very fast. The next step, I throw the email-incoming-unsorted mbox file at a sieve processor, to sort the emails from that mbox, into other mboxes, according to the sieve rules file. Up until a couple days ago I was using Gnu sieve. Gnu sieve balks on emails which have no x-message-id (?? something like this) header field, so after a few years, I finally decided to switch "up" to Dovecot/Pigeonhole's "sieve-filter" command. Using Gnu sieve, this mbox sorting step was even faster than mpop (/ getmail) - and mpop and getmail are really fast (compared with fetchmail), since they pipeline the email downloads. Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds at most. Super fast. Using sieve-filter, all emails are being processed - including those without "message id header". This is good. But also, using sieve filter, is really slower - slower than the download step by an order of magnitude or two. See below for details, any ideas appreciated. To add to the below, I added: mbox_very_dirty_syncs = yes to the sieve-filter config, which slightly improves performance, but not by much (in comparison with Gnu sieve). TIA, ----- Forwarded message from Zenaan Harkness <zenaan at freedbms.net> ----- From: Zenaan Harkness <zenaan at freedbms.net> To: debian-user at lists.debian.org Date: Thu, 12 Sep 2019 08:06:12 +1000 Subject: Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files) On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote:> Why is Gnu sieve so extremely fast to batch process an mbox file, but > while Dovecot's sieve-filter is an order of magnitude slower? > > Sequence: > > - mpop or getmail to pipeline download emails into temp mbox file > - filter that file > > Gnu sieve just flies through a local mbox file and saving emails to > other local mbox files. > > Gnu sieve rejects too many emails with "malformed" errors, so after a > few years I bit the bullet and upgraded to Dovecot's sieve-filter. > > Dovecot's sieve-filter, at present, is an order of magnitude slower. > > Here's my filter command (one line): > > /usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions ~/etc/email/sieve.rc email-incoming-unsorted > > The sieve script is fine now that I have the correct "require" > clauses (hint: "capability strings"). > > File ~/etc/email/sieve-dovecot-config.conf: > > protocols = pop > lda_mailbox_autocreate = yes > lda_mailbox_autosubscribe = yes > mail_fsync = never > > There's no re-sending of emails into my local Postfix SMTP server - I > checked the system logs and confirmed this (journalctl -f). > > I suspect that Gnu sieve was directly writing each email to the > appropriate sieve-determined mbox file (perhaps with only a sync at > the end of a single batch process - what I've attempted to achieve > above with sieve-filter), and that sieve-filter is instead passing > each email through some (dovecot) lda? > > Here's the output for a sieve-filter batch processing of 11 emails: > > $ /usr/bin/sieve-filter -veW -c /home/zen/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions /home/zen/etc/email/sieve.rc email-incoming-unsorted > # PS0 Timestamp: 20190912 at 07:02:23 > info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: VentureBeat: The death of disk? H...'. > info: msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=OYDeDX4FMcoRdGdQ at mail.gmail.com>: stored mail into mailbox 'l/cp/cp'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes] `[zfs-devel] xattr naming format in Zo...'. > info: msgid=<15675101930.d5ba2E.12322 at composer.zfsonlinux.topicbox.com>: stored mail into mailbox 'l/z/zdev'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes] `Re: [zfs-devel] xattr naming format i...'. > info: msgid=<23955051567513749 at sas1-02732547ccc0.qloud-c.yandex.net>: stored mail into mailbox 'l/z/zdev'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes] `Re: [Gluster-users] Issues with Geo-r...'. > info: msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=iGdajTSzkfQ5PCZsUfyg at mail.gmail.com>: stored mail into mailbox 'l/gl/user'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes] `Re: tasksel'. > info: msgid=<20190903133420.GS6166 at eeg.ccf.org>: stored mail into mailbox 'l/deb/user'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes] `[awx-project] Re: AWX on Kubernetes m...'. > info: msgid=<0715adb7-540f-4cff-9282-e1252c53c2e8 at googlegroups.com>: stored mail into mailbox 'l/ansible/awx'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes] `[awx-project] Re: AWX on Kubernetes m...'. > info: msgid=<949b2c17-4254-49f1-83b4-cd54d15aa17d at googlegroups.com>: stored mail into mailbox 'l/ansible/awx'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes] `Re: [zfs-devel] xattr naming format i...'. > info: msgid=<CAB5c7xpHCdFx1w3yA9FyRL-KQ8BUiCr4JbiDQRuFJj9nOgKxTg at mail.gmail.com>: stored mail into mailbox 'l/z/zdev'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes] `Re: [asterisk-users] Playing MP3's in...'. > info: msgid=<20190903151022.354xpe6ds2vglher at red.localdomain>: stored mail into mailbox 'l/as/users'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes] `Re: [Hyperledger Fabric] a primitive ...'. > info: msgid=<160901d8-b903-9e9a-91ac-267571b0e24d at gmx.com>: stored mail into mailbox 'l/hl/fabric'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] `[awx-project] Re: AWX on Kubernetes m...'. > info: msgid=<f9bc4e6a-8445-4b34-927a-35f577ffcc07 at googlegroups.com>: stored mail into mailbox 'l/ansible/awx'. > info: message expunged from source mailbox upon successful move. > 2 ??? zen at eye 20190912 at 07:02:30 ~ $ > > > So about 3/4 of a second is spent by dovecot's sieve-filter, on each > email that it processes - watching it is painful given how fast Gnu > sieve has been for the last few years - it's almost (but not quite) > as slow as my previous fetchmail email download per-email time. > > Attached is a -D debug run of sieve-filter on 20 emails - slightly > longer than the above, and took roughly 15 seconds to run. > > Any help appreciated...On another test run of ~600 emails, sieve-filter is consistently running ~100% of one CPU (for about 4 minutes) to process these emails, which leads to the conclusion that despite what looks like should be a batch process, sieve-filter is perhaps reloading the rules for every single email that it processes, even though I gave it a whole mbox, and not a single email, to process. Can sieve-filter work the way it should / the way I want it / batch process a whole mbox - without reloading the sieve rules for every email? ----- End forwarded message -----
Sami Ketola
2019-Sep-12 10:29 UTC
Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)
Don't use mbox. It is very slow format when mails need to be deleted from the middle. Basically rewriting the whole mbox file each time. Use sdbox instead. Sami> On 12 Sep 2019, at 9.57, Zenaan Harkness via dovecot <dovecot at dovecot.org> wrote: > > I am wondering why sieve-filter is so slow compared to gnu sieve. > > I run mpop (like getmail) to download from a pop3 server to a local > mbox file: ~/mail/email-incoming-unsorted > > This step is very fast. > > The next step, I throw the email-incoming-unsorted mbox file at a > sieve processor, to sort the emails from that mbox, into other > mboxes, according to the sieve rules file. > > Up until a couple days ago I was using Gnu sieve. > > Gnu sieve balks on emails which have no x-message-id (?? something > like this) header field, so after a few years, I finally decided to > switch "up" to Dovecot/Pigeonhole's "sieve-filter" command. > > Using Gnu sieve, this mbox sorting step was even faster than mpop (/ > getmail) - and mpop and getmail are really fast (compared with > fetchmail), since they pipeline the email downloads. > > Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds > at most. Super fast. > > Using sieve-filter, all emails are being processed - including those > without "message id header". This is good. > > But also, using sieve filter, is really slower - slower than the > download step by an order of magnitude or two. > > See below for details, any ideas appreciated. > > To add to the below, I added: > > mbox_very_dirty_syncs = yes > > to the sieve-filter config, which slightly improves performance, but > not by much (in comparison with Gnu sieve). > > TIA, > > > > ----- Forwarded message from Zenaan Harkness <zenaan at freedbms.net> ----- > > From: Zenaan Harkness <zenaan at freedbms.net> > To: debian-user at lists.debian.org > Date: Thu, 12 Sep 2019 08:06:12 +1000 > Subject: Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files) > > On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote: >> Why is Gnu sieve so extremely fast to batch process an mbox file, but >> while Dovecot's sieve-filter is an order of magnitude slower? >> >> Sequence: >> >> - mpop or getmail to pipeline download emails into temp mbox file >> - filter that file >> >> Gnu sieve just flies through a local mbox file and saving emails to >> other local mbox files. >> >> Gnu sieve rejects too many emails with "malformed" errors, so after a >> few years I bit the bullet and upgraded to Dovecot's sieve-filter. >> >> Dovecot's sieve-filter, at present, is an order of magnitude slower. >> >> Here's my filter command (one line): >> >> /usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions ~/etc/email/sieve.rc email-incoming-unsorted >> >> The sieve script is fine now that I have the correct "require" >> clauses (hint: "capability strings"). >> >> File ~/etc/email/sieve-dovecot-config.conf: >> >> protocols = pop >> lda_mailbox_autocreate = yes >> lda_mailbox_autosubscribe = yes >> mail_fsync = never >> >> There's no re-sending of emails into my local Postfix SMTP server - I >> checked the system logs and confirmed this (journalctl -f). >> >> I suspect that Gnu sieve was directly writing each email to the >> appropriate sieve-determined mbox file (perhaps with only a sync at >> the end of a single batch process - what I've attempted to achieve >> above with sieve-filter), and that sieve-filter is instead passing >> each email through some (dovecot) lda? >> >> Here's the output for a sieve-filter batch processing of 11 emails: >> >> $ /usr/bin/sieve-filter -veW -c /home/zen/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions /home/zen/etc/email/sieve.rc email-incoming-unsorted >> # PS0 Timestamp: 20190912 at 07:02:23 >> info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: VentureBeat: The death of disk? H...'. >> info: msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=OYDeDX4FMcoRdGdQ at mail.gmail.com>: stored mail into mailbox 'l/cp/cp'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes] `[zfs-devel] xattr naming format in Zo...'. >> info: msgid=<15675101930.d5ba2E.12322 at composer.zfsonlinux.topicbox.com>: stored mail into mailbox 'l/z/zdev'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes] `Re: [zfs-devel] xattr naming format i...'. >> info: msgid=<23955051567513749 at sas1-02732547ccc0.qloud-c.yandex.net>: stored mail into mailbox 'l/z/zdev'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes] `Re: [Gluster-users] Issues with Geo-r...'. >> info: msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=iGdajTSzkfQ5PCZsUfyg at mail.gmail.com>: stored mail into mailbox 'l/gl/user'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes] `Re: tasksel'. >> info: msgid=<20190903133420.GS6166 at eeg.ccf.org>: stored mail into mailbox 'l/deb/user'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes] `[awx-project] Re: AWX on Kubernetes m...'. >> info: msgid=<0715adb7-540f-4cff-9282-e1252c53c2e8 at googlegroups.com>: stored mail into mailbox 'l/ansible/awx'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes] `[awx-project] Re: AWX on Kubernetes m...'. >> info: msgid=<949b2c17-4254-49f1-83b4-cd54d15aa17d at googlegroups.com>: stored mail into mailbox 'l/ansible/awx'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes] `Re: [zfs-devel] xattr naming format i...'. >> info: msgid=<CAB5c7xpHCdFx1w3yA9FyRL-KQ8BUiCr4JbiDQRuFJj9nOgKxTg at mail.gmail.com>: stored mail into mailbox 'l/z/zdev'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes] `Re: [asterisk-users] Playing MP3's in...'. >> info: msgid=<20190903151022.354xpe6ds2vglher at red.localdomain>: stored mail into mailbox 'l/as/users'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes] `Re: [Hyperledger Fabric] a primitive ...'. >> info: msgid=<160901d8-b903-9e9a-91ac-267571b0e24d at gmx.com>: stored mail into mailbox 'l/hl/fabric'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] `[awx-project] Re: AWX on Kubernetes m...'. >> info: msgid=<f9bc4e6a-8445-4b34-927a-35f577ffcc07 at googlegroups.com>: stored mail into mailbox 'l/ansible/awx'. >> info: message expunged from source mailbox upon successful move. >> 2 ??? zen at eye 20190912 at 07:02:30 ~ $ >> >> >> So about 3/4 of a second is spent by dovecot's sieve-filter, on each >> email that it processes - watching it is painful given how fast Gnu >> sieve has been for the last few years - it's almost (but not quite) >> as slow as my previous fetchmail email download per-email time. >> >> Attached is a -D debug run of sieve-filter on 20 emails - slightly >> longer than the above, and took roughly 15 seconds to run. >> >> Any help appreciated... > > > On another test run of ~600 emails, sieve-filter is consistently > running ~100% of one CPU (for about 4 minutes) to process these > emails, which leads to the conclusion that despite what looks like > should be a batch process, sieve-filter is perhaps reloading the > rules for every single email that it processes, even though I gave it > a whole mbox, and not a single email, to process. > > Can sieve-filter work the way it should / the way I want it / batch > process a whole mbox - without reloading the sieve rules for every > email? > > ----- End forwarded message -----
@lbutlr
2019-Sep-12 13:49 UTC
Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)
On Sep 12, 2019, at 12:57 AM, Zenaan Harkness <zenaan at freedbms.net> wrote:> The next step, I throw the email-incoming-unsorted mbox file at a > sieve processor, to sort the emails from that mbox, into other > mboxes, according to the sieve rules file.I would expect mbox is the worst possible format choice for this.> Gnu sieve balks on emails which have no x-message-id (?? something > like this) header field, so after a few years, I finally decided to > switch "up" to Dovecot/Pigeonhole's "sieve-filter" command. > > Using Gnu sieve, this mbox sorting step was even faster than mpop (/ > getmail) - and mpop and getmail are really fast (compared with > fetchmail), since they pipeline the email downloads.Perhaps because of its reliance on the header allowing it to index?> Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds > at most. Super fast.That doesn?t sound fast. I processed a few thousand messages through sieve in less than 10 seconds, if I recall correctly.> See below for details, any ideas appreciated.The first thing I would do is download to Maildir and see what the difference is. -- What we have here is a failure to communicate.
Possibly Parallel Threads
- Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)
- install AWX without docker
- Doubts about chi-square distribution
- pyramid.plot in plotrix, axis labelling
- Different LDAP filters for different protocols