Robinson Bomze
2015-Jun-03 15:43 UTC
indexer-worker crashes handling mails with big attachments (dovecot 2.2.16/2.2.18 + FTS Apache Solr + Tika)
Hi, yesterday i tried to setup Dovecot with Solr (3.6.2) + Tika (1.8) for FTS. i used a fresh Debian 8.0 system in the beginning with Dovecot 2.2.13 from the Debian repository. After i got some issues with Tika/Dovecot and i read on the mailinglist that these problems where fixed in 2.2.14+, so i tried 2.2.18. With 2.2.18 i get panics with big (ok... huge) attachments. Most mailboxes (and their attachments) get index fine, but on some i got panics from the indexer-worker. i was able to isolate the problem. It seems that when Tika (which works flawless) sends a big reply to Dovecot and Dovecot sends this data to Solr, communication crashes between Dovecot and Solr. Eg. indexing an email with a 200k char wordfile results in a panic of the indexer-worker: Jun 02 23:50:57 indexer-worker(username): Warning: I/O leak: 0x7ff65f39f540 (line 120, fd 20) Jun 02 23:50:57 indexer-worker(username): Warning: Timeout leak: 0x7ff65f39f2e0 (line 325) Jun 02 23:50:57 indexer: Error: Indexer worker disconnected, discarding 1 requests for username Jun 02 23:50:57 imap(username): Error: indexer failed to index mailbox INBOX.username Jun 02 23:50:57 indexer-worker(username): Fatal: master: service(indexer-worker): child 11429 killed with signal 11 (core dumped) I got similar issues results with 2.2.16: Jun 02 23:21:12 indexer-worker(username): Warning: I/O leak: 0x7ffff7811cc0 (line 127, fd 20) Jun 02 23:21:12 indexer-worker(username): Panic: file ioloop.c: line 39 (io_add_file): assertion failed: (callback != NULL) Jun 02 23:21:12 indexer-worker(username): Error: Raw backtrace: /usr/local/lib/dovecot/libdovecot.so.0(+0x77130) [0x7ffff7842130] -> /usr/local/lib/dovecot/libdovecot.so.0(+0x7 Jun 02 23:21:12 indexer: Error: Indexer worker disconnected, discarding 1 requests for username Jun 02 23:21:12 imap(username): Error: indexer failed to index mailbox INBOX.username Jun 02 23:21:12 indexer-worker(username): Fatal: master: service(indexer-worker): child 7909 killed with signal 6 (core dumps disabled) The problem was already posted: http://dovecot.org/pipermail/dovecot/2015-May/100901.html I could trigger the same panic running the indexer via 'doveadm index -u username MAILBOX'. Here is a backtrace (bt) of the 2.2.18-crash (on line #8 you see a fragement of the text sent to solr): #0 array_count_i (array=0x8) at array.h:155 #1 array_get_modifiable_i (count_r=<synthetic pointer>, array=0x8) at array.h:228 #2 priorityq_remove_idx (pq=0x0, idx=0) at priorityq.c:121 #3 0x00007ff65f3ef5eb in priorityq_remove (pq=<optimized out>, item=item at entry=0xa26920) at priorityq.c:138 #4 0x00007ff65f3e1e70 in timeout_remove (_timeout=<optimized out>) at ioloop.c:288 #5 0x00007ff65f3e2781 in io_loop_move_timeout (_timeout=_timeout at entry=0xa27f98) at ioloop.c:861 #6 0x00007ff65f39ff37 in http_client_connection_switch_ioloop (conn=conn at entry=0xa27ea0) at http-client-connection.c:1357 #7 0x00007ff65f3a3d68 in http_client_switch_ioloop (client=client at entry=0xa0bf20) at http-client.c:211 #8 0x00007ff65f39c005 in http_client_request_continue_payload (_req=_req at entry=0xa0ee88, data=0xa42fa0 "k for evidence of fluid spill.\nIf the device is mounted on a stand, examine the condition of the mount.\nIf the device moves on casters, check the condition of the casters. Check operation of brakes, i"..., size=55453) at http-client-request.c:566 #9 0x00007ff65f39c22a in http_client_request_send_payload (_req=_req at entry=0xa0ee88, data=<optimized out>, size=<optimized out>) at http-client-request.c:625 #10 0x00007ff65e972429 in solr_connection_post_more (post=0xa0ee80, data=<optimized out>, size=size at entry=55453) at solr-connection.c:504 #11 0x00007ff65e96ea09 in fts_backed_solr_build_commit (ctx=0xa1a880) at fts-backend-solr.c:341 #12 0x00007ff65e96eaad in fts_backend_solr_update_set_mailbox (_ctx=0xa1a880, box=0x0) at fts-backend-solr.c:407 #13 0x00007ff65eb7cfac in fts_backend_set_cur_mailbox (ctx=ctx at entry=0xa1a880) at fts-api.c:129 #14 0x00007ff65eb7cfe3 in fts_backend_update_deinit (_ctx=<optimized out>) at fts-api.c:143 #15 0x00007ff65eb8303c in fts_transaction_end (t=t at entry=0xa11ed0) at fts-storage.c:550 #16 0x00007ff65eb83e91 in fts_transaction_commit (t=0xa11ed0, changes_r=0x7ffdcdca5e30) at fts-storage.c:615 #17 0x00007ff65f688a82 in mailbox_transaction_commit_get_changes (_t=_t at entry=0x7ffdcdca5ee0, changes_r=changes_r at entry=0x7ffdcdca5e30) at mail-storage.c:1837 #18 0x00007ff65f688b2e in mailbox_transaction_commit (t=t at entry=0x7ffdcdca5ee0) at mail-storage.c:1818 "bt full" looks like this: #0 array_count_i (array=0x8) at array.h:155 No locals. #1 array_get_modifiable_i (count_r=<synthetic pointer>, array=0x8) at array.h:228 No locals. #2 priorityq_remove_idx (pq=0x0, idx=0) at priorityq.c:121 count = <optimized out> #3 0x00007ff65f3ef5eb in priorityq_remove (pq=<optimized out>, item=item at entry=0xa26920) at priorityq.c:138 No locals. #4 0x00007ff65f3e1e70 in timeout_remove (_timeout=<optimized out>) at ioloop.c:288 timeout = 0xa26920 #5 0x00007ff65f3e2781 in io_loop_move_timeout (_timeout=_timeout at entry=0xa27f98) at ioloop.c:861 new_to = 0xa1adf0 old_to = <optimized out> #6 0x00007ff65f39ff37 in http_client_connection_switch_ioloop (conn=conn at entry=0xa27ea0) at http-client-connection.c:1357 No locals. #7 0x00007ff65f3a3d68 in http_client_switch_ioloop (client=client at entry=0xa0bf20) at http-client.c:211 conn = 0xa27ea0 _conn = 0xa27ea0 host = <optimized out> peer = <optimized out> #8 0x00007ff65f39c005 in http_client_request_continue_payload (_req=_req at entry=0xa0ee88, data=0xa42fa0 "k for evidence of fluid spill.\nIf the device is mounted on a stand, examine the condition of the mount.\nIf the device moves on casters, check the condition of the casters. Check operation of brakes, i"..., size=55453) at http-client-request.c:566 prev_ioloop = 0x9f4730 req = 0xa36970 conn = 0xa27ea0 client = 0xa0bf20 ret = <optimized out> __FUNCTION__ = "http_client_request_continue_payload" #9 0x00007ff65f39c22a in http_client_request_send_payload (_req=_req at entry=0xa0ee88, data=<optimized out>, size=<optimized out>) at http-client-request.c:625 __FUNCTION__ = "http_client_request_send_payload" #10 0x00007ff65e972429 in solr_connection_post_more (post=0xa0ee80, data=<optimized out>, size=size at entry=55453) at solr-connection.c:504 conn = 0xa0be50 __FUNCTION__ = "solr_connection_post_more" Hope anyone fixes the code... i need this feature :) Thanks a lot in advance!
Davide
2015-Jun-03 16:08 UTC
indexer-worker crashes handling mails with big attachments (dovecot 2.2.16/2.2.18 + FTS Apache Solr + Tika)
Same problem and same structure Mr Bomze exept for Solr that mine is at the 4.10.3 and tika server at 1.8 version Il 03/06/2015 17:43, Robinson Bomze ha scritto:> Hi, > > yesterday i tried to setup Dovecot with Solr (3.6.2) + Tika (1.8) for > FTS. i used a fresh Debian 8.0 system in the beginning with Dovecot > 2.2.13 from the Debian repository. After i got some issues with > Tika/Dovecot and i read on the mailinglist that these problems where > fixed in 2.2.14+, so i tried 2.2.18. > > With 2.2.18 i get panics with big (ok... huge) attachments. Most > mailboxes (and their attachments) get index fine, but on some i got > panics from the indexer-worker. i was able to isolate the problem. > It seems that when Tika (which works flawless) sends a big reply to > Dovecot and Dovecot sends this data to Solr, communication crashes > between Dovecot and Solr. > > Eg. indexing an email with a 200k char wordfile results in a panic of > the indexer-worker: > > Jun 02 23:50:57 indexer-worker(username): Warning: I/O leak: > 0x7ff65f39f540 (line 120, fd 20) > Jun 02 23:50:57 indexer-worker(username): Warning: Timeout leak: > 0x7ff65f39f2e0 (line 325) > Jun 02 23:50:57 indexer: Error: Indexer worker disconnected, discarding > 1 requests for username > Jun 02 23:50:57 imap(username): Error: indexer failed to index mailbox > INBOX.username > Jun 02 23:50:57 indexer-worker(username): Fatal: master: > service(indexer-worker): child 11429 killed with signal 11 (core dumped) > > I got similar issues results with 2.2.16: > Jun 02 23:21:12 indexer-worker(username): Warning: I/O leak: > 0x7ffff7811cc0 (line 127, fd 20) > Jun 02 23:21:12 indexer-worker(username): Panic: file ioloop.c: line 39 > (io_add_file): assertion failed: (callback != NULL) > Jun 02 23:21:12 indexer-worker(username): Error: Raw backtrace: > /usr/local/lib/dovecot/libdovecot.so.0(+0x77130) [0x7ffff7842130] -> > /usr/local/lib/dovecot/libdovecot.so.0(+0x7 > Jun 02 23:21:12 indexer: Error: Indexer worker disconnected, discarding > 1 requests for username > Jun 02 23:21:12 imap(username): Error: indexer failed to index mailbox > INBOX.username > Jun 02 23:21:12 indexer-worker(username): Fatal: master: > service(indexer-worker): child 7909 killed with signal 6 (core dumps > disabled) > > The problem was already posted: > http://dovecot.org/pipermail/dovecot/2015-May/100901.html > I could trigger the same panic running the indexer via 'doveadm index -u > username MAILBOX'. > > Here is a backtrace (bt) of the 2.2.18-crash (on line #8 you see a > fragement of the text sent to solr): > > #0 array_count_i (array=0x8) at array.h:155 > #1 array_get_modifiable_i (count_r=<synthetic pointer>, array=0x8) at > array.h:228 > #2 priorityq_remove_idx (pq=0x0, idx=0) at priorityq.c:121 > #3 0x00007ff65f3ef5eb in priorityq_remove (pq=<optimized out>, > item=item at entry=0xa26920) at priorityq.c:138 > #4 0x00007ff65f3e1e70 in timeout_remove (_timeout=<optimized out>) at > ioloop.c:288 > #5 0x00007ff65f3e2781 in io_loop_move_timeout > (_timeout=_timeout at entry=0xa27f98) at ioloop.c:861 > #6 0x00007ff65f39ff37 in http_client_connection_switch_ioloop > (conn=conn at entry=0xa27ea0) at http-client-connection.c:1357 > #7 0x00007ff65f3a3d68 in http_client_switch_ioloop > (client=client at entry=0xa0bf20) at http-client.c:211 > #8 0x00007ff65f39c005 in http_client_request_continue_payload > (_req=_req at entry=0xa0ee88, > data=0xa42fa0 "k for evidence of fluid spill.\nIf the device is > mounted on a stand, examine the condition of the mount.\nIf the device > moves on casters, check the condition of the casters. Check operation of > brakes, i"..., size=55453) at http-client-request.c:566 > #9 0x00007ff65f39c22a in http_client_request_send_payload > (_req=_req at entry=0xa0ee88, data=<optimized out>, size=<optimized out>) > at http-client-request.c:625 > #10 0x00007ff65e972429 in solr_connection_post_more (post=0xa0ee80, > data=<optimized out>, size=size at entry=55453) at solr-connection.c:504 > #11 0x00007ff65e96ea09 in fts_backed_solr_build_commit (ctx=0xa1a880) at > fts-backend-solr.c:341 > #12 0x00007ff65e96eaad in fts_backend_solr_update_set_mailbox > (_ctx=0xa1a880, box=0x0) at fts-backend-solr.c:407 > #13 0x00007ff65eb7cfac in fts_backend_set_cur_mailbox > (ctx=ctx at entry=0xa1a880) at fts-api.c:129 > #14 0x00007ff65eb7cfe3 in fts_backend_update_deinit (_ctx=<optimized > out>) at fts-api.c:143 > #15 0x00007ff65eb8303c in fts_transaction_end (t=t at entry=0xa11ed0) at > fts-storage.c:550 > #16 0x00007ff65eb83e91 in fts_transaction_commit (t=0xa11ed0, > changes_r=0x7ffdcdca5e30) at fts-storage.c:615 > #17 0x00007ff65f688a82 in mailbox_transaction_commit_get_changes > (_t=_t at entry=0x7ffdcdca5ee0, changes_r=changes_r at entry=0x7ffdcdca5e30) > at mail-storage.c:1837 > #18 0x00007ff65f688b2e in mailbox_transaction_commit > (t=t at entry=0x7ffdcdca5ee0) at mail-storage.c:1818 > > "bt full" looks like this: > #0 array_count_i (array=0x8) at array.h:155 > No locals. > #1 array_get_modifiable_i (count_r=<synthetic pointer>, array=0x8) at > array.h:228 > No locals. > #2 priorityq_remove_idx (pq=0x0, idx=0) at priorityq.c:121 > count = <optimized out> > #3 0x00007ff65f3ef5eb in priorityq_remove (pq=<optimized out>, > item=item at entry=0xa26920) at priorityq.c:138 > No locals. > #4 0x00007ff65f3e1e70 in timeout_remove (_timeout=<optimized out>) at > ioloop.c:288 > timeout = 0xa26920 > #5 0x00007ff65f3e2781 in io_loop_move_timeout > (_timeout=_timeout at entry=0xa27f98) at ioloop.c:861 > new_to = 0xa1adf0 > old_to = <optimized out> > #6 0x00007ff65f39ff37 in http_client_connection_switch_ioloop > (conn=conn at entry=0xa27ea0) at http-client-connection.c:1357 > No locals. > #7 0x00007ff65f3a3d68 in http_client_switch_ioloop > (client=client at entry=0xa0bf20) at http-client.c:211 > conn = 0xa27ea0 > _conn = 0xa27ea0 > host = <optimized out> > peer = <optimized out> > #8 0x00007ff65f39c005 in http_client_request_continue_payload > (_req=_req at entry=0xa0ee88, > data=0xa42fa0 "k for evidence of fluid spill.\nIf the device is > mounted on a stand, examine the condition of the mount.\nIf the device > moves on casters, check the condition of the casters. Check operation of > brakes, i"..., size=55453) at http-client-request.c:566 > prev_ioloop = 0x9f4730 > req = 0xa36970 > conn = 0xa27ea0 > client = 0xa0bf20 > ret = <optimized out> > __FUNCTION__ = "http_client_request_continue_payload" > #9 0x00007ff65f39c22a in http_client_request_send_payload > (_req=_req at entry=0xa0ee88, data=<optimized out>, size=<optimized out>) > at http-client-request.c:625 > __FUNCTION__ = "http_client_request_send_payload" > #10 0x00007ff65e972429 in solr_connection_post_more (post=0xa0ee80, > data=<optimized out>, size=size at entry=55453) at solr-connection.c:504 > conn = 0xa0be50 > __FUNCTION__ = "solr_connection_post_more" > > Hope anyone fixes the code... i need this feature :) > Thanks a lot in advance! > >
Reasonably Related Threads
- Panic in service(log)
- Panic in service(log)
- [xen-unstable bisection] complete test-amd64-i386-rhel6hvm-amd
- [PATCH RFC 09/12] xen-blkback: move pending handles list from blkbk to pending_req
- [RFC PATCH v7 41/78] KVM: introspection: add KVMI_VM_CHECK_COMMAND and KVMI_VM_CHECK_EVENT