Hi! I use dovecot 2.1.7 on Ubuntu 12.10 with fts_solr und decode2text.sh for indexing attachments. This works great in general. Just for one user there is a problem with an unknown bad attachment. I run "doveadm index -A '*'". After a while I receive: doveadm(xyz): Error: fts_solr: Invalid XML input at line 1: mismatched tag doveadm(xyz): Panic: file solr-connection.c: line 545 (solr_connection_post_more): assertion failed: (maxfd >= 0) doveadm(xyz): Error: Raw backtrace: /usr/lib/dovecot/libdovecot.so.0(+0x3c14a) [0x7f7ce2c1714a] -> /usr/lib/dovecot/libdovecot.so.0(default_fatal_handler+0x2a) [0x7f7ce2c1720a] -> /usr/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7f7ce2bee81a] -> /usr/lib/dovecot/modules/lib21_fts_solr_plugin.so(solr_connection_post_more+0x249) [0x7f7ce11913a9] -> /usr/lib/dovecot/modules/lib21_fts_solr_plugin.so(+0x4597) [0x7f7ce118e597] -> /usr/lib/dovecot/modules/lib20_fts_plugin.so(+0x6f57) [0x7f7ce159df57] -> /usr/lib/dovecot/modules/lib20_fts_plugin.so(fts_build_mail+0xf5) [0x7f7ce159e085] -> /usr/lib/dovecot/modules/lib20_fts_plugin.so(+0xba70) [0x7f7ce15a2a70] -> doveadm(+0x15309) [0x7f7ce35cc309] -> doveadm(+0x11f36) [0x7f7ce35c8f36] -> doveadm(+0x12bf1) [0x7f7ce35c9bf1] -> doveadm(doveadm_mail_try_run+0x161) [0x7f7ce35c9ed1] -> doveadm(main+0x3d1) [0x7f7ce35c8ae1] -> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f7ce283d76d] -> doveadm(+0x11d15) [0x7f7ce35c8d15] In catalina out I find: Nov 18, 2012 2:59:09 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Invalid UTF-8 start byte 0xfc (at char #25214836, byte #26687495) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.valves.RequestFilterValve.process(RequestFilterValve.java:316) at org.apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java:81) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:722) Caused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 start byte 0xfc (at char #25214836, byte #26687495) at com.ctc.wstx.sr.StreamScanner.constructFromIOE(StreamScanner.java:625) at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:994) at com.ctc.wstx.sr.StreamScanner.getNext(StreamScanner.java:754) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2691) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1065) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) ... 19 more Caused by: java.io.CharConversionException: Invalid UTF-8 start byte 0xfc (at char #25214836, byte #26687495) at com.ctc.wstx.io.UTF8Reader.reportInvalidInitial(UTF8Reader.java:303) at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:189) at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:87) at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:988) ... 25 more doveadm index stops after this error. How can I make doveadm just to skip the error and to continue indexing? Thanks Robert -- Robert Str?tgen Abteilungsleiter Informationsmanagement und Publikationen Georg-Eckert-Institut f?r internationale Schulbuchforschung Celler Str. 3 38114 Braunschweig Tel. +49 (0)531 59099-47 & +49 (0)531 123103-205 Fax +49 (0)531 59099-99 http://www.gei.de/
On 18.11.2012, at 16.54, Robert Str?tgen wrote:> Nov 18, 2012 2:59:09 PM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Invalid UTF-8 start byte > 0xfc (at char #25214836, byte #26687495)Annoying. I guess these fix it: http://hg.dovecot.org/dovecot-2.1/rev/172295f5a78b http://hg.dovecot.org/dovecot-2.1/rev/01550514f189 http://hg.dovecot.org/dovecot-2.1/rev/339e654f371e
On 11/26/2012 5:50 PM, Timo Sirainen wrote:> On 18.11.2012, at 16.54, Robert Str?tgen wrote: > >> Nov 18, 2012 2:59:09 PM org.apache.solr.common.SolrException log >> SEVERE: org.apache.solr.common.SolrException: Invalid UTF-8 start byte >> 0xfc (at char #25214836, byte #26687495) > Annoying. I guess these fix it: > > http://hg.dovecot.org/dovecot-2.1/rev/172295f5a78b > http://hg.dovecot.org/dovecot-2.1/rev/01550514f189 > http://hg.dovecot.org/dovecot-2.1/rev/339e654f371e >These patches have improved fts for me - but I still have errors like: Nov 26 20:49:29 bubba dovecot: indexer-worker(dmiller at amfes.com): Panic: file solr-connection.c: line 547 (solr_connection_post_more): assertion failed: (maxfd >= 0) Nov 26 20:49:29 bubba dovecot: indexer-worker(dmiller at amfes.com): Error: Raw backtrace: /usr/local/lib/dovecot/libdovecot.so.0(+0x45cea) [0x7f0c66c33cea] -> /usr/local/lib/dovecot/libdovecot.so.0(+0x45d2e) [0x7f0c66c33d2e] -> /usr/local/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7f0c66c07d10] -> /usr/local/lib/dovecot/lib21_fts_solr_plugin.so(+0x6de5) [0x7f0c653a6de5] -> /usr/local/lib/dovecot/lib21_fts_solr_plugin.so(+0x3867) [0x7f0c653a3867] -> /usr/local/lib/dovecot/lib20_fts_plugin.so(fts_build_mail+0x53b) [0x7f0c655b2b2b] -> /usr/local/lib/dovecot/lib20_fts_plugin.so(+0xc530) [0x7f0c655b7530] -> dovecot/indexer-worker [dmiller at amfes.com Archives/2010 - 7000/7266]() [0x402326] -> dovecot/indexer-worker [dmiller at amfes.com Archives/2010 - 7000/7266]() [0x4026cc] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x36) [0x7f0c66c40b76] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7f0c66c419c7] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f0c66c406b8] -> /usr/local/lib/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7f0c66c2c203] -> dovecot/indexer-worker [dmiller at amfes.com Archives/2010 - 7000/7266](main+0x10a) [0x401dfa] -> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f0c6685276d] -> dovecot/indexer-worker [dmiller at amfes.com Archives/2010 - 7000/7266]() [0x401e9d] The solr log shows: Nov 26, 2012 8:49:29 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Illegal character ((CTRL-CHAR, code 8)) at [row,col {unknown-source}]: [1011144,197790] -- Daniel
On 11/27/2012 7:28 AM, Daniel L. Miller wrote:> On 11/26/2012 10:08 PM, Timo Sirainen wrote: >> On 27.11.2012, at 7.50, Timo Sirainen wrote: >> >>>> Nov 26, 2012 8:49:29 PM org.apache.solr.common.SolrException log >>>> SEVERE: org.apache.solr.common.SolrException: Illegal character >>>> ((CTRL-CHAR, code 8)) >>>> at [row,col {unknown-source}]: [1011144,197790] >>> Something's wrong. The Solr code was already supposed to catch all >>> of these. >>I was taking a brief scan of the code - and as usual I'm probably wrong - but I believe the protection comes from the xml_encode functions. Could it be that there are some solr writes that don't go through that function - because it is assumed that the data in question doesn't need that processing? Like mailbox names, field names, or uids - that SHOULDN'T have any garbage but maybe something is creeping in? -- Daniel
On 11/28/2012 8:49 AM, Daniel L. Miller wrote:> On 11/28/2012 12:55 AM, Timo Sirainen wrote: >> On 28.11.2012, at 10.50, Daniel L. Miller wrote: >> >>> On 11/27/2012 6:45 PM, Timo Sirainen wrote: >>>> On 28.11.2012, at 4.43, Daniel L. Miller wrote: >>>> >>>>>> I did go through the code looking for that a few times already >>>>>> but didn't notice anything. I went through it once more, and >>>>>> finally found the problem. :) >>>>>> http://hg.dovecot.org/dovecot-2.1/rev/6a97faf3e500 >>>>>> >>>>> :( Mine still breaks. Both UTF-8 and Control-Char errors. >>>> Can you grab the network traffic between Dovecot and Solr and find >>>> the problematic stream? >>>> >>> Tell me how and I'll be happy to! >> Maybe the easiest would be to use tcpflow. It outputs different TCP >> streams to different files. From them you can then grep for the error >> and look closer into it. I guess something like wireshark would work >> too, but I've never been able to use its GUI in a useful way. >> > Would I just do "tcpflow -i lo port 8983"? Or something else? >Stream capture sent to you. -- Daniel