Hi This is the summary of my work with SOLR-Dovecot, in my QUEST TO REPRODUCE THE PREVIOULSY EXCELLENT WORK OF FTS_SQUAT @Aki : Based on the time I have spent on this, I would love to see you updating the Wiki with those improvements, and adding my name somewhere @All : Hope it helps - INSTALLATION: -> Create a clean install using the default, (at least in the Archlinux package), and do a "sudo -u solr solr create -c dovecot ". The config files are then in /opt/solr/server/solr/dovecot/conf and datafiles in /opt/solr/server/solr/dovecot/data -> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml: * around line 313, change <openSearcher>false</openSearcher> to <openSearcher>true</openSearcher> * around line 147, set <writeLockTimeout>2000</writeLockTimeout> (or above) * around line 696 : uncomment <str name="df">hdr</str> * around line 1127, before <updateProcessor class="solr.UUIDUpdateProcessorFactory" name="uuid"/>, add <schemaFactory class="ClassicIndexSchemaFactory"></schemaFactory> * around line 1161, delete the whole <updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory" name="add-schema-fields"> * around line 1192, remove the whole <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" ... /> -> Remove /opt/solr/server/solr/dovecot/conf/managed-schema -> Change "schema.xml" by the one below to reproduce fts_squat behavior (equivalent to " fts_squat = partial=3 full=25" in dovecot.conf) (note : such a huge trouble to replace a single line setup, anyway...) -> Move /opt/solr/server/solr (or the subfolder data) to a partition with *space*, ideally ext4 or faster file system (it looks like Solr is not considering using a simple mysql database, which would make sense to avoid all the fuzz and let it transit to a non-java state, but that is another story) -> Config of dovecot.conf is as below -> The systemd unit shall specify high ulimit for files and proc (see below) -> Increase the memory available for the JavaVM (I put 12Gb as I have quite a space on my server, but you may adapt it as per your specs) : in /opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m" -> As Solr is complaining a lot, you may consider a filter for it in your syslog-ng or journald as it pollutes greatly your audit files -> (re)Start solr (first) and dovecot by systemctl -> Launch redindex ( doveadm fts rescan -u <username> ) -> wait for a big while to let the system re-index all your mail boxes - BUGS SO FAR -> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated ("huge header" warning for a simple email, which kilss the index of that considered email, so basically MOST emails as the calculation is wrong) -> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough) -> Java errors : A lot of non sense for me, I am not expert in Java. But, with increased memory, it seems not crashing, even if complaining quite a lot in the logs -------SCHEMA.XML IN /OPT/SOLR/SERVER/SOLR/DOVECOT/CONF <?xml version="1.0" encoding="UTF-8"?> <schema name="dovecot" version="2.0"> <uniqueKey>id</uniqueKey> <fieldType name="dovecottext" class="solr.TextField" autoGeneratePhraseQueries="true" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" splitOnNumerics="1" catenateAll="1" catenateWords="1" preserveOriginal="1"/> <filter class="solr.FlattenGraphFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> <fieldType name="dovecotfield" class="solr.TextField" autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="25"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> <fieldType name="string" class="solr.StrField"/> <field name="_version_" type="string" indexed="true" stored="true"/> <field name="bcc" type="string" indexed="false" stored="false"/> <field name="body" type="dovecottext" indexed="true" stored="false"/> <field name="box" type="string" indexed="true" required="true" stored="true"/> <field name="cc" type="dovecotfield" indexed="true" stored="false"/> <field name="from" type="dovecotfield" indexed="true" stored="false"/> <field name="hdr" type="string" indexed="false" stored="false"/> <field name="id" type="string" indexed="true" required="true" stored="true"/> <field name="subject" type="dovecottext" indexed="true" stored="false"/> <field name="to" type="dovecotfield" indexed="true" stored="false"/> <field name="uid" type="string" indexed="true" required="true" stored="true"/> <field name="user" type="string" indexed="true" required="true" stored="true"/> </schema> -- DOVECOT.CONF mail_plugins = fts fts_solr plugin { plugin = fts fts_solr managesieve sieve fts = solr fts_autoindex = yes fts_enforced = yes fts_solr = url=http://127.0.0.1:8983/solr/dovecot/ (replace 127.0.0.1 by your solr server if you want to use an external server) (...) } -- /ETC/SYSTEMD/SYSTEM/MULTI-USER.TARGET.WANTS/SOLR.SERVICE [Unit] Description=Solr full text search engine After=network.target [Service] Type=simple User=solr Group=solr PrivateTmp=yes WorkingDirectory=/opt/solr LIMITNOFILE=65000 LIMITNPROC=65000 ExecStart=/opt/solr/bin/solr start -f [Install] WantedBy=multi-user.target -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190104/d2a25b49/attachment.html>
Hi, Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot:> > Hi > > This is the summary of my work with SOLR-Dovecot, in my *quest to > reproduce the previoulsy excellent work of fts_squat* > > > @Aki : Based on the time I have spent on this, I would love to see you > updating the Wiki with those improvements, and adding my name somewhere > > @All : Hope it helps >I'll be going through the description below soon. I've recently independently installed fts-solr from scratch. Although this wasn't a flawless effort, I managed to get some basic indexing going. From this mail thread I understand that there are quite a few more problems than I've seen myself so far. Then again, I didn't perform extensive tests with actual searches. Maybe we can turn all this into a test suite that we can run internally here at Dovecot. At the very least, the described Dovecot bugs need to be addressed and the wiki needs to be updated. I'll get back to you. Regards, Stephan.> > *- Installation:* > > -> Create a clean install using the default, (at least in the > Archlinux package), and do a "sudo -u solr solr create -c dovecot ". > The config files are then in /opt/solr/server/solr/dovecot/conf and > datafiles in /opt/solr/server/solr/dovecot/data > > -> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml: > > ???? * around line 313, change <openSearcher>false</openSearcher> to > <openSearcher>true</openSearcher> > > ???? * around line 147, set <writeLockTimeout>2000</writeLockTimeout> > (or above) > > ???? * around line 696 : uncomment <str name="df">hdr</str> > > ???? * around line 1127, before <updateProcessor > class="solr.UUIDUpdateProcessorFactory" name="uuid"/>, add > <schemaFactory class="ClassicIndexSchemaFactory"></schemaFactory> > > ???? * around line 1161, delete the whole <updateProcessor > class="solr.AddSchemaFieldsUpdateProcessorFactory" > name="add-schema-fields"> > > ??? * around line 1192, remove the whole <updateRequestProcessorChain > name="add-unknown-fields-to-the-schema" ... /> > > -> Remove /opt/solr/server/solr/dovecot/conf/managed-schema > > -> Change "schema.xml" by the one below to reproduce fts_squat > behavior? (equivalent to " fts_squat = partial=3 full=25" in > dovecot.conf) (note : such a huge trouble to replace a single line > setup, anyway...) > > -> Move /opt/solr/server/solr (or the subfolder data) to a partition > with *space*, ideally ext4 or faster file system (it looks like Solr > is not considering using a simple mysql database, which would make > sense to avoid all the fuzz and let it transit to a non-java state, > but that is another story) > > -> Config of dovecot.conf is as below > > -> The systemd unit shall specify high ulimit for files and proc (see > below) > > -> Increase the memory available for the JavaVM (I put 12Gb as I have > quite a space on my server, but you may adapt it as per your specs) : > in /opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m" > > -> As Solr is complaining a lot, you may consider a filter for it in > your syslog-ng or journald as it pollutes greatly your audit files > > -> (re)Start solr (first) and dovecot by systemctl > > -> Launch redindex ( doveadm fts rescan -u <username> ) > > -> wait for a big while to let the system re-index all your mail boxes > > > *- Bugs so far* > > -> Line 620 of fts_solr dovecot plugin : the size oof header is > improperly calculated ("huge header" warning for a simple email, which > kilss the index of that considered email, so basically MOST emails as > the calculation is wrong) > > -> The UID returned by SOlr is to be considered as a STRING (and that > is maybe the source of problem of the "out of bound" errors in > fts_solr dovecot, as "long" is not enough) > > -> Java errors : A lot of non sense for me, I am not expert in Java. > But, with increased memory, it seems not crashing, even if complaining > quite a lot in the logs > > > > > *-------SCHEMA.XML in /opt/solr/server/solr/dovecot/conf* > > <?xml version="1.0" encoding="UTF-8"?> > <schema name="dovecot" version="2.0"> > <uniqueKey>id</uniqueKey> > <fieldType name="dovecottext" class="solr.TextField" > autoGeneratePhraseQueries="true" positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.ClassicTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" > generateWordParts="1" splitOnNumerics="1" catenateAll="1" > catenateWords="1" preserveOriginal="1"/> > <filter class="solr.FlattenGraphFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.TrimFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.TrimFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > <fieldType name="dovecotfield" class="solr.TextField" > autoGeneratePhraseQueries="true"> > <analyzer type="index"> > <tokenizer class="solr.ClassicTokenizerFactory"/> > <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="25"/> > <filter class="solr.TrimFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.TrimFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > > <fieldType name="string" class="solr.StrField"/> > <field name="_version_" type="string" indexed="true" stored="true"/> > <field name="bcc" type="string" indexed="false" stored="false"/> > <field name="body" type="dovecottext" indexed="true" stored="false"/> > <field name="box" type="string" indexed="true" required="true" > stored="true"/> > <field name="cc" type="dovecotfield" indexed="true" stored="false"/> > <field name="from" type="dovecotfield" indexed="true" stored="false"/> > <field name="hdr" type="string" indexed="false" stored="false"/> > <field name="id" type="string" indexed="true" required="true" > stored="true"/> > <field name="subject" type="dovecottext" indexed="true" stored="false"/> > <field name="to" type="dovecotfield" indexed="true" stored="false"/> > <field name="uid" type="string" indexed="true" required="true" > stored="true"/> > <field name="user" type="string" indexed="true" required="true" > stored="true"/> > </schema> > > > *-- DOVECOT.CONF* > > mail_plugins = fts fts_solr > > plugin { > plugin = fts fts_solr managesieve sieve > > fts = solr > fts_autoindex = yes > fts_enforced = yes > fts_solr = url=http://127.0.0.1:8983/solr/dovecot/ > > (replace 127.0.0.1 by your solr server if you want to use an external > server) > (...) > > } > > > > *-- /etc/systemd/system/multi-user.target.wants/solr.service* > > [Unit] > Description=Solr full text search engine > After=network.target > > [Service] > Type=simple > User=solr > Group=solr > PrivateTmp=yes > WorkingDirectory=/opt/solr > *LimitNOFILE=65000* > *LimitNPROC=65000* > ExecStart=/opt/solr/bin/solr start -f > > [Install] > WantedBy=multi-user.target > >
Hi Stephan, What's up with that ? Thank you so much On 2019-01-05 02:04, Stephan Bosch wrote:> Hi, > > Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot: > >> Hi >> >> This is the summary of my work with SOLR-Dovecot, in my *quest to reproduce the previoulsy excellent work of fts_squat* >> >> @Aki : Based on the time I have spent on this, I would love to see you updating the Wiki with those improvements, and adding my name somewhere >> >> @All : Hope it helps > I'll be going through the description below soon. I've recently independently installed fts-solr from scratch. Although this wasn't a flawless effort, I managed to get some basic indexing going. From this mail thread I understand that there are quite a few more problems than I've seen myself so far. Then again, I didn't perform extensive tests with actual searches. > > Maybe we can turn all this into a test suite that we can run internally here at Dovecot. At the very least, the described Dovecot bugs need to be addressed and the wiki needs to be updated. > > I'll get back to you. > > Regards, > > Stephan. > >> *- Installation:* >> >> -> Create a clean install using the default, (at least in the Archlinux package), and do a "sudo -u solr solr create -c dovecot ". The config files are then in /opt/solr/server/solr/dovecot/conf and datafiles in /opt/solr/server/solr/dovecot/data >> >> -> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml: >> >> * around line 313, change <openSearcher>false</openSearcher> to <openSearcher>true</openSearcher> >> >> * around line 147, set <writeLockTimeout>2000</writeLockTimeout> (or above) >> >> * around line 696 : uncomment <str name="df">hdr</str> >> >> * around line 1127, before <updateProcessor class="solr.UUIDUpdateProcessorFactory" name="uuid"/>, add <schemaFactory class="ClassicIndexSchemaFactory"></schemaFactory> >> >> * around line 1161, delete the whole <updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory" name="add-schema-fields"> >> >> * around line 1192, remove the whole <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" ... /> >> >> -> Remove /opt/solr/server/solr/dovecot/conf/managed-schema >> >> -> Change "schema.xml" by the one below to reproduce fts_squat behavior (equivalent to " fts_squat = partial=3 full=25" in dovecot.conf) (note : such a huge trouble to replace a single line setup, anyway...) >> >> -> Move /opt/solr/server/solr (or the subfolder data) to a partition with *space*, ideally ext4 or faster file system (it looks like Solr is not considering using a simple mysql database, which would make sense to avoid all the fuzz and let it transit to a non-java state, but that is another story) >> >> -> Config of dovecot.conf is as below >> >> -> The systemd unit shall specify high ulimit for files and proc (see below) >> >> -> Increase the memory available for the JavaVM (I put 12Gb as I have quite a space on my server, but you may adapt it as per your specs) : in /opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m" >> >> -> As Solr is complaining a lot, you may consider a filter for it in your syslog-ng or journald as it pollutes greatly your audit files >> >> -> (re)Start solr (first) and dovecot by systemctl >> >> -> Launch redindex ( doveadm fts rescan -u <username> ) >> >> -> wait for a big while to let the system re-index all your mail boxes >> >> *- Bugs so far* >> >> -> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated ("huge header" warning for a simple email, which kilss the index of that considered email, so basically MOST emails as the calculation is wrong) >> >> -> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough) >> >> -> Java errors : A lot of non sense for me, I am not expert in Java. But, with increased memory, it seems not crashing, even if complaining quite a lot in the logs >> >> *-------SCHEMA.XML in /opt/solr/server/solr/dovecot/conf* >> >> <?xml version="1.0" encoding="UTF-8"?> >> <schema name="dovecot" version="2.0"> >> <uniqueKey>id</uniqueKey> >> <fieldType name="dovecottext" class="solr.TextField" autoGeneratePhraseQueries="true" positionIncrementGap="100"> >> <analyzer type="index"> >> <tokenizer class="solr.ClassicTokenizerFactory"/> >> <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" splitOnNumerics="1" catenateAll="1" catenateWords="1" preserveOriginal="1"/> >> <filter class="solr.FlattenGraphFilterFactory"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.TrimFilterFactory"/> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.KeywordTokenizerFactory"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.TrimFilterFactory"/> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> </fieldType> >> <fieldType name="dovecotfield" class="solr.TextField" autoGeneratePhraseQueries="true"> >> <analyzer type="index"> >> <tokenizer class="solr.ClassicTokenizerFactory"/> >> <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="25"/> >> <filter class="solr.TrimFilterFactory"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.KeywordTokenizerFactory"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.TrimFilterFactory"/> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> </fieldType> >> >> <fieldType name="string" class="solr.StrField"/> >> <field name="_version_" type="string" indexed="true" stored="true"/> >> <field name="bcc" type="string" indexed="false" stored="false"/> >> <field name="body" type="dovecottext" indexed="true" stored="false"/> >> <field name="box" type="string" indexed="true" required="true" stored="true"/> >> <field name="cc" type="dovecotfield" indexed="true" stored="false"/> >> <field name="from" type="dovecotfield" indexed="true" stored="false"/> >> <field name="hdr" type="string" indexed="false" stored="false"/> >> <field name="id" type="string" indexed="true" required="true" stored="true"/> >> <field name="subject" type="dovecottext" indexed="true" stored="false"/> >> <field name="to" type="dovecotfield" indexed="true" stored="false"/> >> <field name="uid" type="string" indexed="true" required="true" stored="true"/> >> <field name="user" type="string" indexed="true" required="true" stored="true"/> >> </schema> >> >> *-- DOVECOT.CONF* >> >> mail_plugins = fts fts_solr >> >> plugin { >> plugin = fts fts_solr managesieve sieve >> >> fts = solr >> fts_autoindex = yes >> fts_enforced = yes >> fts_solr = url=http://127.0.0.1:8983/solr/dovecot/ >> >> (replace 127.0.0.1 by your solr server if you want to use an external server) >> (...) >> >> } >> >> *-- /etc/systemd/system/multi-user.target.wants/solr.service* >> >> [Unit] >> Description=Solr full text search engine >> After=network.target >> >> [Service] >> Type=simple >> User=solr >> Group=solr >> PrivateTmp=yes >> WorkingDirectory=/opt/solr >> *LimitNOFILE=65000* >> *LimitNPROC=65000* >> ExecStart=/opt/solr/bin/solr start -f >> >> [Install] >> WantedBy=multi-user.target-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190114/fc1e92c3/attachment.html>