Alexey Panov
2021-Jan-21 14:10 UTC
[BUG REPORT] In some cases dovecot sends (huge) binary data to solr for indexing
In some cases (exact condition still unknown) dovecot sends binary data (attachments) to SOLR for indexing. This reduces index and overall FTS efficiency dramatically. In extreme condition (below an example of 20MB) dovecot?s hardwired timeout of 60s gets triggered during HTTP exchange with SOLR on just a single file. This results in an unfinished index which, by initial indexing, gets restarted over and over. With multiple affected mailboxes even on moderate usage this can cause an IO overload of the whole system. Message example (doveadm fetch text): https://filebin.ca/5oy5Wc1QrBK3/fetch-text.obfuscated.txt <https://filebin.ca/5oy5Wc1QrBK3/fetch-text.obfuscated.txt> Corresponding raw log data: https://filebin.ca/5oy6yqLSCr3H/rawlog.obfuscated.txt <https://filebin.ca/5oy6yqLSCr3H/rawlog.obfuscated.txt> (Both files were processed with perl doveadm-obfuscate.pl <https://www.dovecot.org/tools/doveadm-obfuscate.pl>; the script doesn?t replace non-latin characters so they were replaced with ?R? manually) Workaround: there is a useful patch by John Fawcett? <https://www.mail-archive.com/dovecot at dovecot.org/msg82296.html> that allows to set the FTS indexing message body maximum size. It works perfectly, but affected messages are getting completely ignored by FTS. This bug report is a summarised result of this discussion <https://www.mail-archive.com/dovecot at dovecot.org/msg82599.html>. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20210121/beafb891/attachment.html>
John Fawcett
2021-Jan-21 18:33 UTC
[BUG REPORT] In some cases dovecot sends (huge) binary data to solr for indexing
On 21/01/2021 15:10, Alexey Panov wrote:> In some cases (exact condition still unknown) dovecot sends binary > data (attachments) to SOLR for indexing. This reduces index and > overall FTS efficiency dramatically.? > > In extreme condition (below an example of 20MB) dovecot?s hardwired > timeout of 60s gets triggered during HTTP exchange with SOLR on just a > single file. This results in an unfinished index which, by initial > indexing, gets restarted over and over. With multiple affected > mailboxes even on moderate usage this can cause an IO overload of the > whole system. > > Message example (doveadm fetch > text):?https://filebin.ca/5oy5Wc1QrBK3/fetch-text.obfuscated.txt > <https://filebin.ca/5oy5Wc1QrBK3/fetch-text.obfuscated.txt> > Corresponding raw log > data:?https://filebin.ca/5oy6yqLSCr3H/rawlog.obfuscated.txt > <https://filebin.ca/5oy6yqLSCr3H/rawlog.obfuscated.txt> > > (Both files were processed with?perl?doveadm-obfuscate.pl > <https://www.dovecot.org/tools/doveadm-obfuscate.pl>; the script > doesn?t replace non-latin characters so they were replaced with ?R? > manually) > > Workaround: there is a useful?patch by John Fawcett? > <https://www.mail-archive.com/dovecot at dovecot.org/msg82296.html>?that > allows to set the FTS indexing message body maximum size. It works > perfectly, but affected messages are getting completely ignored by FTS. > > This bug report is a summarised result of?this discussion > <https://www.mail-archive.com/dovecot at dovecot.org/msg82599.html>.?Alexey just a couple of questions. I am expecting that the messages with sizes exceeding the configurable limit introduced by my patch submission are not completely ignored, but that headers are getting indexed. I don't have time to check it now, but I'm pretty sure about it. Do you have evidence that the messages are not being indexed at all. The desired behaviour of my patch fts_max_size configuration was to bypass only message body indexing not bypass indexing completely. Are you requesting a different behaviour to the one provided by the patch? I imagine that people would find it useful to still parse the message body up to the limit. That would be a little more trickly, but potentially a good idea for a further enhancement. Thanks John -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20210121/2e3c2d39/attachment.html>