Dror Matalon
2009-May-06 20:19 UTC
[Dovecot] Solr FTS issues (Was fts-solr plugin issue (Marked invalid))
Hi, Sorry for the change of thread, I just signed up to the list so I couldn't reply to the earlier message. Let me clarify the issue that Nikolai was describing. We're running dovecot 1.1.11 and solr 1.4. The issue is quite simple. 1. I run a search. 2. Dovecot sends a list of emails to solr 3. Solr starts indexing them 4. Solr runs into a "bad" email and we get: SEVERE: java.io.IOException: Mark invalid org.apache.solr.common.SolrException log SEVERE: java.io.IOException: Mark invalid at java.io.BufferedReader.reset(BufferedReader.java:485) at org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171) at org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728) at org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742) at java.io.Reader.read(Reader.java:123) at org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:109) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159) at org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765) ... 5. All the above email messages that were indexed are rolled back, and we're stuck. I think that the solution should be simple too :-). When solr runs into a bad email, it should just ignore it and keep indexing. This seems much more robust since the emails are coming from a variety of sources, and we can assume that some of them are going to generate badly formated emails. Seems like having a few bad emails not indexed is much better than the current situation of a bad email stopping all searching. I don't understand the architecture enough between fts, fts_solr and solr to know where this should be solved. Ideally, it would be a simple directive to solr. Regards, Dror ----- Dror Matalon President Zapatec Inc 866 522-7941 X 704 1700 MLK Way Berkeley, CA 94709 http://www.6zap.com http://twitter.com/drormata http://www.zapatec.com