> *- Installation:* > > -> Create a clean install using the default, (at least in the Archlinux package), and do a "sudo -u solr solr create -c dovecot ". The config files are then in /opt/solr/server/solr/dovecot/conf and datafiles in /opt/solr/server/solr/dovecot/dataOn my system (Debian) these directories are wildly different (e.g. data is under /var), but other than that, this information is OK. Used this as a side-reference for Debian installation: https://tecadmin.net/install-apache-solr-on-debian/ Accessed http://solr-host.tld:8983/solr/ to check whether all is OK. MAKE SURE YOU HAVE A DOVECOT INSTANCE (NOT THE DEFAULT INSTANCE) , WITH THE FUNCTION BELOW: SOLR CREATE -C DOVECOT (OR WHATEVER NAME)> Weirdly, rescan returns immediately here. When I perform `doveadm index INBOX` for my test user, I do see a lot of fts and HTTP activity.THE SOLR PLUGIN IS NOT CODED ENTIRELY, REFRESH AND RESCAN FUNCTIONS ARE MISSING : https://github.com/dovecot/core/blob/master/src/plugins/fts-solr/fts-backend-solr.c static int fts_backend_solr_refresh(struct fts_backend *backend ATTR_UNUSED) { return 0; } static int fts_backend_solr_rescan(struct fts_backend *backend) { /* FIXME: proper rescan needed. for now we'll just reset the last-uids */ return fts_backend_reset_last_uids(backend); }> *- Bugs so far* > > -> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated ("huge header" warning for a simple email, which kilss the index of that considered email, so basically MOST emails as the calculation is wrong)YOU CAN CHECK THAT REGULARLY IN DOVECOT LOG FILE. MY GUESS IS THE MIX OF UNICODE WHICH IS NOT PROPERLY ADDRESSED HERE.> -> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough)THIS IS JUST HIGHLY VISIBLE IN SOLR SCHEMA.XML. SWITHCING IT TO "LONG" IN SCHEMA.XML RETURNS PLENTY OF ERRORS.> -> Java errors : A lot of non sense for me, I am not expert in Java. But, with increased memory, it seems not crashing, even if complaining quite a lot in the logs > > Can you elaborate on the errors you have seen so far? When do these happen? How can I reproduce them?HONESTLY, I HAVE NO CLUE WHAT THE PROBLEMS ARE. I JUST INCREASED THE MEMORY OF THE JVM AND THE SYSTEMS STOPPED CRASHING. LOG FILES ARE HUGE ANYWAY. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190126/6f6e4237/attachment-0001.html>
(forgot to CC mailing list) Op 26/01/2019 om 20:07 schreef Joan Moreau via dovecot:>> >> >> *-?Bugs?so?far* >> >> -> Line 620 of fts_solr dovecot plugin : the size oof header is >> improperly calculated ("huge header" warning for a simple email, >> which kilss the index of that considered email, so basically MOST >> emails as the calculation is wrong) > *You can check that regularly in dovecot log file. My guess is the mix > of Unicode which is not properly addressed here.*Does this happen with specific messages? Do you have a sample message for me? I don't see how Unicode could cause this.>> >> -> The UID returned by SOlr is to be considered as a STRING (and that >> is maybe the source of problem of the "out of bound" errors in >> fts_solr dovecot, as "long" is not enough) > *This is just highly visible in Solr schema.xml. Swithcing it to > "long" in schema.xml returns plenty of errors.*I cannot reproduce this so far (see modified schema below). In a simple test I just get the desired results and no errors logged.>> >> -> Java errors : A lot of non sense for me, I am not expert in Java. >> But, with increased memory, it seems not crashing, even if >> complaining quite a lot in the logs >> >> Can you elaborate on the errors you have seen so far? When do these >> happen? How can I reproduce them? >> > *Honestly, I have no clue what the problems are. I just increased the > memory of the JVM and the systems stopped crashing. Log files are huge > anyway.*What errors do you see? I see only INFO entries in my /var/solr/logs/solr.log. Looks like Solr is pretty verbose by default (lots of INFO output), but there must be a way to reduce that. Regards, Stephan. <?xml version="1.0" encoding="UTF-8"?> <schema name="dovecot" version="2.0"> <uniqueKey>id</uniqueKey> <fieldType name="long" class="solr.LongPointField" positionIncrementGap="0"/> <fieldType name="dovecottext" class="solr.TextField" autoGeneratePhraseQueries="true" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" splitOnNumerics="1" catenateAll="1" catenateWords="1" preserveOriginal="1"/> <filter class="solr.FlattenGraphFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> <fieldType name="dovecotfield" class="solr.TextField" autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="25"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> <fieldType name="string" class="solr.StrField"/> <field name="_version_" type="string" indexed="true" stored="true"/> <field name="bcc" type="string" indexed="false" stored="false"/> <field name="body" type="dovecottext" indexed="true" stored="false"/> <field name="box" type="string" indexed="true" required="true" stored="true"/> <field name="cc" type="dovecotfield" indexed="true" stored="false"/> <field name="from" type="dovecotfield" indexed="true" stored="false"/> <field name="hdr" type="string" indexed="false" stored="false"/> <field name="id" type="string" indexed="true" required="true" stored="true"/> <field name="subject" type="dovecottext" indexed="true" stored="false"/> <field name="to" type="dovecotfield" indexed="true" stored="false"/> <field name="uid" type="long" indexed="true" required="true" stored="true"/> <field name="user" type="string" indexed="true" required="true" stored="true"/> </schema>
On 2019-01-30 07:33, Stephan Bosch wrote:> (forgot to CC mailing list) > > Op 26/01/2019 om 20:07 schreef Joan Moreau via dovecot: > > *- Bugs so far* > > -> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated ("huge header" warning for a simple email, which kilss the index of that considered email, so basically MOST emails as the calculation is wrong) *You can check that regularly in dovecot log file. My guess is the mix of Unicode which is not properly addressed here.*Does this happen with specific messages? Do you have a sample message for me? I don't see how Unicode could cause this. MY ONLY GUESS IS THAT IT REFERS TO SOME 'STRLEN', WHICH IS WRONG OF COURSE IN CASE OF UNICODE EMAILS. THIS IS JUST A GUESS. BUT DO A GREP FOR "HUGE" IN THE DOVECOT LOG OF A BUSY SERVER TO FIND EXAMPLES. (SORRY, I SWITCHED TO XAPIAN, AS SOLR IS CREATING TOO MUCH TROUBLES FOR MY SERVER, SO NO MORE CONCRETE EXAMPLE)>> -> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough) > *This is just highly visible in Solr schema.xml. Swithcing it to "long" in schema.xml returns plenty of errors.*I cannot reproduce this so far (see modified schema below). In a simple test I just get the desired results and no errors logged. I got this with large mailboxes (where UID seems not acceptable for Solr ). The fault is not on Dovecot side but Solr, and the returned UID(s) for a search is garbage instead of a proper value -> Putting it as string solves this>> -> Java errors : A lot of non sense for me, I am not expert in Java. But, with increased memory, it seems not crashing, even if complaining quite a lot in the logs >> >> Can you elaborate on the errors you have seen so far? When do these happen? How can I reproduce them? > *Honestly, I have no clue what the problems are. I just increased the memory of the JVM and the systems stopped crashing. Log files are huge anyway.*What errors do you see? I see only INFO entries in my /var/solr/logs/solr.log. Looks like Solr is pretty verbose by default (lots of INFO output), but there must be a way to reduce that. I DELETED SOLR. NO MORE LOGS. MAYBE SOMEONE ELSE CAN TELL. <?xml version="1.0" encoding="UTF-8"?> <schema name="dovecot" version="2.0"> <uniqueKey>id</uniqueKey> <fieldType name="long" class="solr.LongPointField" positionIncrementGap="0"/> <fieldType name="dovecottext" class="solr.TextField" autoGeneratePhraseQueries="true" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" splitOnNumerics="1" catenateAll="1" catenateWords="1" preserveOriginal="1"/> <filter class="solr.FlattenGraphFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> <fieldType name="dovecotfield" class="solr.TextField" autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="25"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> <fieldType name="string" class="solr.StrField"/> <field name="_version_" type="string" indexed="true" stored="true"/> <field name="bcc" type="string" indexed="false" stored="false"/> <field name="body" type="dovecottext" indexed="true" stored="false"/> <field name="box" type="string" indexed="true" required="true" stored="true"/> <field name="cc" type="dovecotfield" indexed="true" stored="false"/> <field name="from" type="dovecotfield" indexed="true" stored="false"/> <field name="hdr" type="string" indexed="false" stored="false"/> <field name="id" type="string" indexed="true" required="true" stored="true"/> <field name="subject" type="dovecottext" indexed="true" stored="false"/> <field name="to" type="dovecotfield" indexed="true" stored="false"/> <field name="uid" type="long" indexed="true" required="true" stored="true"/> <field name="user" type="string" indexed="true" required="true" stored="true"/> </schema> -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190130/8d134bd5/attachment.html>