Alexey Panov
2021-Jan-15 10:39 UTC
fts_solr: Indexing failed (fails on particular messages/mailboxes by timeout)
I guess I was indeed correct. As message body is the part of mail index dovecot indeed sends all those binary messages as a part of message body:> doveadm fetch -u test.user at interpont.com "body" mailbox KS2 uid 2 | wc --bytes > 21549696This affects FTS indexing dramatically I guess?> On 15. Jan 2021, at 17:10, Alexey Panov <programmierus at icloud.com> wrote: > > I could find messages causing problems as solr log indeed shows the IMAP UID. > > Reducing batch_size to a single message I could identify one of those messages causing the problem. > > Now I can confirm the following behaviour: there are some SINGLE messages where 60 seconds is NOT ENOUGH to index. The message I am looking at now is 20MB big (compressed 17MB) and has a lot of inline pasted screenshots. I guess that dovecot might send those pasted images as a part of the message body to solr? > > Here are some logs for better understanding: > > Folder contents (single message): > >> doveadm search -u u at d mailbox KS2 >> e836ff1d20640160340b0000532cee39 2 > > > Force indexing: > >> doveadm index -u u at d KS2 >> doveadm(u at d): Error: fts_solr: Indexing failed: Request timed out (Request queued 61.105 secs ago, 1 send attempts in 61.089 secs, 60.071 in http ioloop, 0.000 in other ioloops, connected 61.089 secs ago) >> doveadm(u at d): Error: Mailbox KS2: Transaction commit failed: FTS transaction commit failed: backend deinit > > > Corresponding solr logs: > >> solr-mailcow_1 | 2021-01-15 10:02:34.901 INFO (qtp524223214-2127) [ x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] webapp=/solr path=/update params={}{add=[2/e836ff1d20640160340b0000532cee39/u at d (1688946475272241152)]} 0 80683 > > I would greatly appreciate any help. My users using pasted images heavily (designers exchanging mockups etc) and if this truly is a problem then I am screwed :) > > >> On 14. Jan 2021, at 23:46, Alexey Panov <programmierus at icloud.com <mailto:programmierus at icloud.com>> wrote: >> >> Hello, >> >> I am running recent dovecot 2.3.13 (89f716dc2) within dockerized mailcow setup. >> >> During just another update routine I decided to reindex all mailboxes and some of them fail (timeout) on exactly SAME messages in SAME mailboxes: >> >>> doveadm(u at d): Error: fts_solr: Indexing failed: Request timed out (Request queued 73.213 secs ago, 1 send attempts in 73.098 secs, 60.597 in http ioloop, 0.000 in other ioloops, connected 127.875 secs ago) >>> doveadm(u at d): Error: Mailbox Sent: Mail search failed: Internal error occurred. Refer to server log for more information. [2021-01-14 11:23:46] >>> doveadm(u at d): Error: Mailbox Sent: Transaction commit failed: FTS transaction commit failed: backend deinit >> >> This error is persistent and affects only some, but always same folders of same mailboxes (i mean solr can't finish indexing them at all). >> >> I/O is in perfect condition and has no corruptions. I also tried to do fts rescan which made no change. RAM is fine, I haven't seen OOM conditions. >> >> During my research I found this discussion <https://www.mail-archive.com/dovecot at dovecot.org/msg76944.html> which lead me idea of reducing the batch_size to as low as 10 (for testing purposes). Observing SOLR logs during reindexing on the mailbox/folder causing the error I noticed the following strange behaviour. >> >> Firstly, it goes very fast but then always kinda freezes on these messages: >> >>> solr-mailcow_1 | 2021-01-14 16:05:09.615 INFO (qtp524223214-1151) [ x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] webapp=/solr path=/update params={}{add=[6883/78a0ec27b091da5e4054000013681708/u at d (1688878773188952064), 6884/78a0ec27b091da5e4054000013681708/u at d (1688878773203632128), 6885/78a0ec27b091da5e4054000013681708/u at d (1688878773218312192), 6886/78a0ec27b091da5e4054000013681708/u at d (1688878773227749376), 6887/78a0ec27b091da5e4054000013681708/u at d (1688878773237186560), 6888/78a0ec27b091da5e4054000013681708/u at d (1688878773244526592), 6889/78a0ec27b091da5e4054000013681708/u at d (1688878773250818048), 6890/78a0ec27b091da5e4054000013681708/u at d (1688878773259206656), 6891/78a0ec27b091da5e4054000013681708/u at d (1688878773272838144), 6892/78a0ec27b091da5e4054000013681708/u at d (1688878773288566784)]} 0 101 >>> solr-mailcow_1 | 2021-01-14 16:06:42.540 INFO (qtp524223214-1143) [ x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] webapp=/solr path=/update params={}{add=[6893/78a0ec27b091da5e4054000013681708/u at d (1688878773329461248), 6894/78a0ec27b091da5e4054000013681708/u at d (1688878773332606976), 6895/78a0ec27b091da5e4054000013681708/u at d (1688878774361260032), 6896/78a0ec27b091da5e4054000013681708/u at d (1688878850290745344), 6897/78a0ec27b091da5e4054000013681708/u at d (1688878851347709952), 6898/78a0ec27b091da5e4054000013681708/u at d (1688878852404674560), 6899/78a0ec27b091da5e4054000013681708/u at d (1688878853586419712), 6900/78a0ec27b091da5e4054000013681708/u at d (1688878870704422912), 6901/78a0ec27b091da5e4054000013681708/u at d (1688878870709665792), 6902/78a0ec27b091da5e4054000013681708/u at d (1688878870727491584)]} 0 92891 >> >> >> It takes almost one and a half minute to get those ten messages (6893-6902) indexed. It freezes exactly on that position every time I restart indexing. Actually only reducing the batch_size to 10 made it through. It was timing out with the default batch_size exactly here each time. >> >> Than it goes further again blazing fast until it reaches here: >> >>> solr-mailcow_1 | 2021-01-14 16:07:26.148 INFO (qtp524223214-1151) [ x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] webapp=/solr path=/update params={}{add=[7443/78a0ec27b091da5e4054000013681708/u at d (1688878884286627840), 7444/78a0ec27b091da5e4054000013681708/u at d (1688878884293967872), 7445/78a0ec27b091da5e4054000013681708/u at d (1688878884304453632), 7446/78a0ec27b091da5e4054000013681708/u at d (1688878884502634496), 7447/78a0ec27b091da5e4054000013681708/u at d (1688878900331937792), 7448/78a0ec27b091da5e4054000013681708/u at d (1688878900508098560), 7449/78a0ec27b091da5e4054000013681708/u at d (1688878916409753600), 7450/78a0ec27b091da5e4054000013681708/u at d (1688878916424433664), 7451/78a0ec27b091da5e4054000013681708/u at d (1688878916438065152), 7452/78a0ec27b091da5e4054000013681708/u at d (1688878916448550912)]} 0 30682 >>> solr-mailcow_1 | 2021-01-14 16:09:40.277 INFO (qtp524223214-1325) [ x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] webapp=/solr path=/update params={}{add=[7463/78a0ec27b091da5e4054000013681708/u at d (1688879057103486976)]} 0 1 >>> solr-mailcow_1 | 2021-01-14 16:10:00.628 INFO (qtp524223214-1009) [ x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] webapp=/solr path=/update params={}{add=[7453/78a0ec27b091da5e4054000013681708/u at d (1688878916486299648), 7454/78a0ec27b091da5e4054000013681708/u at d (1688878916489445376), 7455/78a0ec27b091da5e4054000013681708/u at d (1688878916496785408), 7456/78a0ec27b091da5e4054000013681708/u at d (1688878917584158720), 7457/78a0ec27b091da5e4054000013681708/u at d (1688878994974310400), 7458/78a0ec27b091da5e4054000013681708/u at d (1688879073177108480), 7459/78a0ec27b091da5e4054000013681708/u at d (1688879074234073088), 7460/78a0ec27b091da5e4054000013681708/u at d (1688879075288940544), 7461/78a0ec27b091da5e4054000013681708/u at d (1688879076339613696), 7462/78a0ec27b091da5e4054000013681708/u at d (1688879077392384000)]} 0 154455 >> >> This results in the following error on the console where I start reindexing: >> >>> $ docker-compose exec dovecot-mailcow doveadm index -u u at d 'Sent' >>> doveadm(u at d): Error: fts_solr: Indexing failed: Request timed out (Request queued 134.122 secs ago, 1 send attempts in 134.098 secs, 131.923 in http ioloop, 0.000 in other ioloops, connected 376.408 secs ago) >>> doveadm(u at d): Error: Mailbox Sent: Mail search failed: Internal error occurred. Refer to server log for more information. [2021-01-14 17:09:40] >>> doveadm(u at d): Error: Mailbox Sent: Transaction commit failed: FTS transaction commit failed: backend deinit >> >> >> The last message about indexed block of ten messages (2021-01-14 16:10:00.628 7453-7462) arrives already after reindexing error on the other console occurs and doveadm exits. Interestingly, exactly at the time mentioned in the error message (2021-01-14 17:09:40, i.e. 16:06:40 - I don?t know why timezone differs) a message about indexing of a single message 7463 arrives (which is actually NEXT message to the block that gets indexed 20 seconds later). >> >> The whole mailbox consists of 8530 messages and doveadm/solr can?t finish indexing it at all starting over and over. >> >> Can you please help me understanding the reason of this behaviour? >> >> P.S. Is there any possibility to determine what message in vmail maildir corresponds to solr log, e.g. to 7463/78a0ec27b091da5e4054000013681708/u at d (1688879057103486976) ? >> >> Thanks, Programmierus >-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20210115/200122c4/attachment-0001.html>
John Fawcett
2021-Jan-16 12:16 UTC
fts_solr: Indexing failed (fails on particular messages/mailboxes by timeout)
On 15/01/2021 11:39, Alexey Panov wrote:> I guess I was indeed correct. As message body is the part of mail > index dovecot indeed sends all those binary messages as a part of > message body: > >> doveadm fetch -u test.user at interpont.com >> <mailto:test.user at interpont.com> "body" mailbox KS2 uid 2 | wc --bytes >> 21549696 > > This affects FTS indexing dramatically I guess?? > >> On 15. Jan 2021, at 17:10, Alexey Panov <programmierus at icloud.com >> <mailto:programmierus at icloud.com>> wrote: >> >> I could find messages causing problems as solr log indeed shows the >> IMAP UID.? >> >> Reducing batch_size to a single message I could identify one of those >> messages causing the problem.? >> >> Now I can confirm the following behaviour: there are some SINGLE >> messages where 60 seconds is NOT ENOUGH to index. The message I am >> looking at now is 20MB big (compressed 17MB) and has a lot of inline >> pasted screenshots. I guess that dovecot might send those pasted >> images as a part of the message body to solr? >> >> Here are some logs for better understanding: >> >> Folder contents (single message): >> >>> doveadm search -u u at d mailbox KS2 >>> e836ff1d20640160340b0000532cee39 2 >> >> Force indexing: >> >>> doveadm index -u u at d KS2 >>> doveadm(u at d): Error: fts_solr: Indexing failed: Request timed out >>> (Request queued 61.105 secs ago, 1 send attempts in 61.089 secs, >>> 60.071 in http ioloop, 0.000 in other ioloops, connected 61.089 secs >>> ago) >>> doveadm(u at d): Error: Mailbox KS2: Transaction commit failed: FTS >>> transaction commit failed: backend deinit >> >> Corresponding solr logs: >> >>> solr-mailcow_1 ? ? ? | 2021-01-15 10:02:34.901 INFO >>> ?(qtp524223214-2127) [ ? x:dovecot-fts] >>> o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] ?webapp=/solr >>> path=/update params={}{add=[2/e836ff1d20640160340b0000532cee39/u at d >>> (1688946475272241152)]} 0 80683 >> >> I would greatly appreciate any help. My users using pasted images >> heavily (designers exchanging mockups etc) and if this truly is a >> problem then I am screwed :)? >> >> >>> On 14. Jan 2021, at 23:46, Alexey Panov <programmierus at icloud.com >>> <mailto:programmierus at icloud.com>> wrote: >>> >>> Hello,? >>> >>> I am running recent dovecot?2.3.13 (89f716dc2) within dockerized >>> mailcow setup.? >>> >>> During just another update routine I decided to reindex all >>> mailboxes and some of them fail (timeout) on exactly SAME messages >>> in SAME mailboxes: >>> >>>> doveadm(u at d): Error: fts_solr: Indexing failed: Request timed out >>>> (Request queued 73.213 secs ago, 1 send attempts in 73.098 secs, >>>> 60.597 in http ioloop, 0.000 in other ioloops, connected 127.875 >>>> secs ago) >>>> doveadm(u at d): Error: Mailbox Sent: Mail search failed: Internal >>>> error occurred. Refer to server log for more information. >>>> [2021-01-14 11:23:46] >>>> doveadm(u at d): Error: Mailbox Sent: Transaction commit failed: FTS >>>> transaction commit failed: backend deinit >>> >>> This error is persistent and affects only some, but always same >>> folders of same mailboxes (i mean solr can't finish indexing them at >>> all).? >>> >>> I/O is in perfect condition and has no corruptions. I also tried to >>> do fts rescan which made no change. RAM is fine, I haven't seen OOM >>> conditions. >>> >>> During my research I found?this discussion >>> <https://www.mail-archive.com/dovecot at dovecot.org/msg76944.html>?which >>> lead me idea of reducing the batch_size to as low as 10 (for testing >>> purposes). Observing SOLR logs during reindexing on the >>> mailbox/folder causing the error I noticed the following strange >>> behaviour.? >>> >>> Firstly, it goes very fast but then always kinda freezes on these >>> messages: >>> >>>> solr-mailcow_1 ? ? ? | *2021-01-14 16:05:09.615* INFO >>>> ?(qtp524223214-1151) [ ? x:dovecot-fts] >>>> o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] ?webapp=/solr >>>> path=/update >>>> params={}{add=[6883/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878773188952064), 6884/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878773203632128), 6885/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878773218312192), 6886/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878773227749376), 6887/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878773237186560), 6888/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878773244526592), 6889/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878773250818048), 6890/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878773259206656), 6891/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878773272838144), 6892/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878773288566784)]} 0 101 >>>> solr-mailcow_1 ? ? ? | *2021-01-14 16:06:42.540* INFO >>>> ?(qtp524223214-1143) [ ? x:dovecot-fts] >>>> o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] ?webapp=/solr >>>> path=/update >>>> params={}{add=[6893/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878773329461248), 6894/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878773332606976), 6895/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878774361260032), 6896/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878850290745344), 6897/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878851347709952), 6898/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878852404674560), 6899/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878853586419712), 6900/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878870704422912), 6901/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878870709665792), 6902/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878870727491584)]} 0 92891 >>> >>> It takes almost one and a half minute to get those ten messages >>> (6893-6902) indexed. It freezes exactly on that position every time >>> I restart indexing. Actually only reducing the batch_size to 10 made >>> it through. It was timing out with the default batch_size exactly >>> here each time.? >>> >>> Than it goes further again blazing fast until it reaches here: >>> >>>> solr-mailcow_1 ? ? ? | *2021-01-14 16:07:26.148* INFO >>>> ?(qtp524223214-1151) [ ? x:dovecot-fts] >>>> o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] ?webapp=/solr >>>> path=/update >>>> params={}{add=[7443/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878884286627840), 7444/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878884293967872), 7445/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878884304453632), 7446/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878884502634496), 7447/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878900331937792), 7448/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878900508098560), 7449/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878916409753600), 7450/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878916424433664), 7451/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878916438065152), 7452/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878916448550912)]} 0 30682 >>>> solr-mailcow_1 ? ? ? |*2021-01-14 16:09:40.277* INFO >>>> ?(qtp524223214-1325) [ ? x:dovecot-fts] >>>> o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] ?webapp=/solr >>>> path=/update >>>> params={}{add=[7463/78a0ec27b091da5e4054000013681708/u at d >>>> (1688879057103486976)]} 0 1 >>>> solr-mailcow_1 ? ? ? | *2021-01-14 16:10:00.628* INFO >>>> ?(qtp524223214-1009) [ ? x:dovecot-fts] >>>> o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] ?webapp=/solr >>>> path=/update >>>> params={}{add=[7453/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878916486299648), 7454/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878916489445376), 7455/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878916496785408), 7456/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878917584158720), 7457/78a0ec27b091da5e4054000013681708/u at d >>>> (1688878994974310400), 7458/78a0ec27b091da5e4054000013681708/u at d >>>> (1688879073177108480), 7459/78a0ec27b091da5e4054000013681708/u at d >>>> (1688879074234073088), 7460/78a0ec27b091da5e4054000013681708/u at d >>>> (1688879075288940544), 7461/78a0ec27b091da5e4054000013681708/u at d >>>> (1688879076339613696), 7462/78a0ec27b091da5e4054000013681708/u at d >>>> (1688879077392384000)]} 0 154455 >>> >>> This results in the following error on the console where I start >>> reindexing:? >>> >>>> $ docker-compose exec dovecot-mailcow doveadm index -u u at d 'Sent' >>>> doveadm(u at d): Error: fts_solr: Indexing failed: Request timed out >>>> (Request queued 134.122 secs ago, 1 send attempts in 134.098 secs, >>>> 131.923 in http ioloop, 0.000 in other ioloops, connected 376.408 >>>> secs ago) >>>> doveadm(u at d): Error: Mailbox Sent: Mail search failed: Internal >>>> error occurred. Refer to server log for more information. >>>> [*2021-01-14 17:09:40*] >>>> doveadm(u at d): Error: Mailbox Sent: Transaction commit failed: FTS >>>> transaction commit failed: backend deinit >>> >>> The last message about indexed block of ten messages (*2021-01-14 >>> 16:10:00.628 7453-7462)*?arrives already after reindexing error on >>> the other console occurs and doveadm exits. Interestingly, exactly >>> at the time mentioned in the error message (*2021-01-14 17:09:40*, >>> i.e. *16:06:40* - I don?t know why timezone differs) a message about >>> indexing of a single message *7463* arrives (which is actually NEXT >>> message to the block that gets indexed 20 seconds later).? >>> >>> The whole mailbox consists of 8530 messages and doveadm/solr can?t >>> finish indexing it at all starting over and over. >>> >>> Can you please help me understanding the reason of this behaviour? >>> >>> P.S. Is there any possibility to determine what message in vmail >>> maildir corresponds to solr log, e.g. to >>> 7463/78a0ec27b091da5e4054000013681708/u at d (1688879057103486976) ?? >>> >>> Thanks, Programmierus >> >I previously posted a patch (7 Dec 2020) you might find useful in this circumstance which includes an additional setting, for example: plugin { ... ??? fts_max_size = 2M ... } fts_max_size - do not parse message bodies if the message size exceeds this value. A value of 0 indicates no limit.? When the message body is not parsed, attachments are also not parsed. John -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20210116/4c5e2070/attachment-0001.html>