Hi. We user ftp-solr plugin and have problem with solr-1.3+ with HTMLStripWhitespaceTokenizerFactory (Solr schema in attachments). In some maildir's present messages with wrong "Content-Type: " fields in attachments. For example: " Content-Type: TEXT/mspowerpoint; name="Zapatec_6zap_netvibes_1.ppt" " Indexing for this messages is stop with "fts_solr: Indexing failed: 500 Internal Server Error". In solr log is: " SEVERE: java.io.IOException: Mark invalid at java.io.BufferedReader.reset(BufferedReader.java:485) " (mail list with discussion: http://markmail.org/message/2fnfiwygvehjngyr#query:SEVERE%3A%20java.io.IOException%3A%20Mark%20invalid%20lucene+page:1+mid:2fnfiwygvehjngyr+state:results) Look's like dovecot try to index attachments like this. Also for some messages we have same error. Dovecot stop indexing of box and each search we have lag and CPU load on server. So we need to make dovecot more "stable" to this error. For first time , will be good, just ignore problematic messages with error from solr. Let's discuss this issue, because this is general problem. We ready to explore code where needed , etc. Regards, Nikolai Powered by the 6zap. Sign up at http://www.6zap.com for an account that provides advanced e-mail, calendar and contacts capabilities.
OK. Concentrating problem in one question. How to ignore "bad" message and index next one in indexing procedure (fts plugin) ?. Now, one "error 500" from solr and dovecot (# 1.1.11: /etc/dovecot/dovecot.conf # OS: Linux 2.6.21.7-2.fc8xen i686 Ubuntu 8.04.2 ext3 ) stop and each next search query repeat the story. I've explored fts, and ftp-solr directories in src, without success for now. Timo, you understand code much bettter, can you help me and point to place in code, or probably create some patch, if possible ?. On Fri, 05/01/2009 at 5:56pm, "Nikolai Derzhak" <nikolai at 6zap.com> wrote:> Hi. > > We user ftp-solr plugin and have problem with solr-1.3+ > with HTMLStripWhitespaceTokenizerFactory (Solr schema in attachments). > In some maildir's present messages with wrong "Content-Type: " fields in > attachments. > For example: > " > Content-Type: TEXT/mspowerpoint; name="Zapatec_6zap_netvibes_1.ppt" > " > Indexing for this messages is stop with "fts_solr: Indexing failed: 500 > Internal Server Error". > In solr log is: > " > SEVERE: java.io.IOException: Mark invalid > at java.io.BufferedReader.reset(BufferedReader.java:485) > " > (mail list with discussion: http://markmail.org/message/2fnfiwygvehjngyr# > query:SEVERE%3A%20java.io.IOException%3A%20Mark%20invalid%20lucene+page:1+ > mid:2fnfiwygvehjngyr+state:results) > > Look's like dovecot try to index attachments like this. > Also for some messages we have same error. > Dovecot stop indexing of box and each search we have lag and CPU load on > server. > > So we need to make dovecot more "stable" to this error. > For first time , will be good, just ignore problematic messages with error > from solr. > > Let's discuss this issue, because this is general problem. > We ready to explore code where needed , etc. > > Regards, > Nikolai > > Powered by the 6zap. Sign up at http://www.6zap.com for an account that > provides advanced e-mail, calendar and contacts capabilities.Powered by the 6zap. Sign up at http://www.6zap.com for an account that provides advanced e-mail, calendar and contacts capabilities.
On Mon, 05/04/2009 at 5:36pm, "Rui Carneiro" <rui.carneiro at portugalmail.net> wrote:> I do not have sure if I understood your problem correctly. > > Are you trying to index attachments from messages? Or Dovecot is indexing > some "bad" parts and you just do not know why? >Read first post, please, with description. But in sum: when dovecot try to index some mail's, that solr tokenizer not eat (error 500, Marked invalid), dovecot stop indexing of box and retry attempts in each next search with same result.> Regards, > Rui Carneiro > -- > Portugalmail, Comunica??es S.A. > www.portugalmail.net > > Citando Nikolai Derzhak <nikolai at 6zap.com>: > >> OK. Concentrating problem in one question. >> How to ignore "bad" message and index next one in indexing procedure (fts >> plugin) ?. >> Now, one "error 500" from solr and dovecot (# 1.1.11: >> /etc/dovecot/dovecot.conf >> # OS: Linux 2.6.21.7-2.fc8xen i686 Ubuntu 8.04.2 ext3 >> ) stop and each next search query repeat the story. >> I've explored fts, and ftp-solr directories in src, without success for now. >> >> Timo, you understand code much bettter, can you help me and point to place >> in >> code, >> or probably create some patch, if possible ?.Powered by the 6zap. Sign up at http://www.6zap.com for an account that provides advanced e-mail, calendar and contacts capabilities.
On Mon, 05/04/2009 at 6:23pm, "Timo Sirainen" <tss at iki.fi> wrote:> On May 4, 2009, at 10:16 AM, Nikolai Derzhak wrote: > >> OK. Concentrating problem in one question. >> How to ignore "bad" message and index next one in indexing procedure >> (fts plugin) ?. >> Now, one "error 500" from solr and dovecot (# 1.1.11: /etc/dovecot/ >> dovecot.conf >> # OS: Linux 2.6.21.7-2.fc8xen i686 Ubuntu 8.04.2 ext3 >> ) stop and each next search query repeat the story. > > Maybe simply using 1.1.14 would help? I already fixed one Solr issue: http:// > hg.dovecot.org/dovecot-1.1/rev/678c3252a454 >I've merge back this commit to 1.1.11 code, cause this is my request about special chars ;).> If that's not the problem, it would help to have one of those mails > that breaks it.Yes. Now i have two new issue: 1. "Content-Type:" header in mail wrong - so it's TEXT/mspowerpoint for example, but it's binary *.ppt file. 2. many html from repository change-set(many commit's to html) in text/plain mail In both cases we have "Error 500" from solr (HTMLStripReader class) (in solr log: "Marked inlalid"). And in both cases we can not detect this in dovecot code (or it's hard to do). So for now i need some method to ignore "bad" messages. Because we can not catch all variants, but indexing die on first error like this. Powered by the 6zap. Sign up at http://www.6zap.com for an account that provides advanced e-mail, calendar and contacts capabilities.
Send mail body with wrong Content-Type. If put this in maildir indexing raise 500 error. If fix Content-Type to "application/octet-stream" all OK. Because TEXT/*; content dovecot try to index and solr raise 500, on each search. On Tue, 05/05/2009 at 3:01pm, "Nikolai Derzhak" <nikolai at 6zap.com> wrote:> On Mon, 05/04/2009 at 6:23pm, "Timo Sirainen" <tss at iki.fi> wrote: >> On May 4, 2009, at 10:16 AM, Nikolai Derzhak wrote: >> >>> OK. Concentrating problem in one question. >>> How to ignore "bad" message and index next one in indexing procedure >>> (fts plugin) ?. >>> Now, one "error 500" from solr and dovecot (# 1.1.11: /etc/dovecot/ >>> dovecot.conf >>> # OS: Linux 2.6.21.7-2.fc8xen i686 Ubuntu 8.04.2 ext3 >>> ) stop and each next search query repeat the story. >> >> Maybe simply using 1.1.14 would help? I already fixed one Solr issue: http:/ >> / >> hg.dovecot.org/dovecot-1.1/rev/678c3252a454 >> > I've merge back this commit to 1.1.11 code, cause this is my request about > special chars ;). > >> If that's not the problem, it would help to have one of those mails >> that breaks it. > Yes. Now i have two new issue: > 1. "Content-Type:" header in mail wrong - so it's TEXT/mspowerpoint for > example, but it's binary *.ppt file. > 2. many html from repository change-set(many commit's to html) in text/plain > mail > > In both cases we have "Error 500" from solr (HTMLStripReader class) (in solr > log: "Marked inlalid"). > And in both cases we can not detect this in dovecot code (or it's hard to do) > . > So for now i need some method to ignore "bad" messages. > Because we can not catch all variants, but indexing die on first error like > this. > > > > Powered by the 6zap. Sign up at http://www.6zap.com for an account that > provides advanced e-mail, calendar and contacts capabilities.Powered by the 6zap. Sign up at http://www.6zap.com for an account that provides advanced e-mail, calendar and contacts capabilities.
---------- Forwarded message ---------- From: "Nikolai Derzhak" <nikolai at 6zap.com> To: "Timo Sirainen" <tss at iki.fi> CC: "Dovecot Mailing" <dovecot at dovecot.org> Date: 05/05/2009 3:01pm Subject: Re: [Dovecot] fts-solr plugin issue (Marked invalid) On Mon, 05/04/2009 at 6:23pm, "Timo Sirainen" <tss at iki.fi> wrote:> On May 4, 2009, at 10:16 AM, Nikolai Derzhak wrote: > >> OK. Concentrating problem in one question. >> How to ignore "bad" message and index next one in indexing procedure >> (fts plugin) ?. >> Now, one "error 500" from solr and dovecot (# 1.1.11: /etc/dovecot/ >> dovecot.conf >> # OS: Linux 2.6.21.7-2.fc8xen i686 Ubuntu 8.04.2 ext3 >> ) stop and each next search query repeat the story. > > Maybe simply using 1.1.14 would help? I already fixed one Solr issue: http:// > hg.dovecot.org/dovecot-1.1/rev/678c3252a454 >I've merge back this commit to 1.1.11 code, cause this is my request about special chars ;).> If that's not the problem, it would help to have one of those mails > that breaks it.Yes. Now i have two new issue: 1. "Content-Type:" header in mail wrong - so it's TEXT/mspowerpoint for example, but it's binary *.ppt file. 2. many html from repository change-set(many commit's to html) in text/plain mail In both cases we have "Error 500" from solr (HTMLStripReader class) (in solr log: "Marked inlalid"). And in both cases we can not detect this in dovecot code (or it's hard to do). So for now i need some method to ignore "bad" messages. Because we can not catch all variants, but indexing die on first error like this. Powered by the 6zap. Sign up at http://www.6zap.com for an account that provides advanced e-mail, calendar and contacts capabilities.