On 12/21/2018, 11:19:42 AM, Daniel Miller via dovecot <dovecot at dovecot.org> wrote:> There is a *huge* difference between a functional Solr setup & squatInteresting. Care to elaborate?
On 1/3/2019 10:56 AM, Tanstaafl wrote:> On 12/21/2018, 11:19:42 AM, Daniel Miller via dovecot > <dovecot at dovecot.org> wrote: >> There is a *huge* difference between a functional Solr setup & squat > Interesting. Care to elaborate?This is one of those things that has to be experienced to be understood.? When you can perform an FTS search across (pause while I check current stats...): du -c -h /var/mail??? ??? 136G Solr numDocs:??? ??? 520102 and using any IMAP client that supports server-side searches (like Thunderbird & AquaMail) the results are basically instantaneous...it's worth the effort.? And that's searching a Dovecot virtual folder defined as "* all", including all my archives, all my list subscriptions, and all the shared Inbox/Sent folders from my other users. But I certainly wish it was easier to setup. -- Daniel
Hi
This is the summary of my work with SOLR-Dovecot, in my QUEST TO
REPRODUCE THE PREVIOULSY EXCELLENT WORK OF FTS_SQUAT
@Aki : Based on the time I have spent on this, I would love to see you
updating the Wiki with those improvements, and adding my name somewhere
@All : Hope it helps
- INSTALLATION:
-> Create a clean install using the default, (at least in the Archlinux
package), and do a "sudo -u solr solr create -c dovecot ". The config
files are then in /opt/solr/server/solr/dovecot/conf and datafiles in
/opt/solr/server/solr/dovecot/data
-> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:
* around line 313, change <openSearcher>false</openSearcher> to
<openSearcher>true</openSearcher>
* around line 147, set
<writeLockTimeout>2000</writeLockTimeout>
(or above)
* around line 1127, before <updateProcessor
class="solr.UUIDUpdateProcessorFactory" name="uuid"/>,
add
<schemaFactory
class="ClassicIndexSchemaFactory"></schemaFactory>
* around line 1161, delete the whole <updateProcessor
class="solr.AddSchemaFieldsUpdateProcessorFactory"
name="add-schema-fields">
* around line 1192, remove the whole <updateRequestProcessorChain
name="add-unknown-fields-to-the-schema" ... />
-> Remove /opt/solr/server/solr/dovecot/conf/managed-schema
-> Change "schema.xml" by the one below to reproduce fts_squat
behavior
(equivalent to " fts_squat = partial=3 full=25" in dovecot.conf) (note
:
such a huge trouble to replace a single line setup, anyway...)
-> Move /opt/solr/server/solr (or the subfolder data) to a partition
with *space*, ideally ext4 or faster file system (it looks like Solr is
not considering using a simple mysql database, which would make sense to
avoid all the fuzz and let it transit to a non-java state, but that is
another story)
-> Config of dovecot.conf is as below
-> The systemd unit shall specify high ulimit for files and proc (see
below)
-> Increase the memory available for the JavaVM (I put 12Gb as I have
quite a space on my server, but you may adapt it as per your specs) : in
/opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m"
-> As Solr is complaining a lot, you may consider a filter for it in
your syslog-ng or journald as it pollutes greatly your audit files
-> (re)Start solr (first) and dovecot by systemctl
-> Launch redindex ( doveadm fts rescan -u <username> )
-> wait for a big while to let the system re-index all your mail boxes
- BUGS SO FAR
-> Line 620 of fts_solr dovecot plugin : the size oof header is
improperly calculated ("huge header" warning for a simple email, which
kilss the index of that considered email, so basically MOST emails as
the calculation is wrong)
-> The UID returned by SOlr is to be considered as a STRING (and that is
maybe the source of problem of the "out of bound" errors in fts_solr
dovecot, as "long" is not enough)
-> Java errors : A lot of non sense for me, I am not expert in Java.
But, with increased memory, it seems not crashing, even if complaining
quite a lot in the logs
-------SCHEMA.XML IN /OPT/SOLR/SERVER/SOLR/DOVECOT/CONF
<?xml version="1.0" encoding="UTF-8"?>
<schema name="dovecot" version="2.0">
<uniqueKey>id</uniqueKey>
<fieldType name="dovecottext" class="solr.TextField"
autoGeneratePhraseQueries="true"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory"
catenateNumbers="1"
generateNumberParts="1" splitOnCaseChange="1"
generateWordParts="1"
splitOnNumerics="1" catenateAll="1"
catenateWords="1"
preserveOriginal="1"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="dovecotfield" class="solr.TextField"
autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3"
maxGramSize="25"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="string" class="solr.StrField"/>
<field name="_version_" type="string"
indexed="true" stored="true"/>
<field name="bcc" type="string" indexed="false"
stored="false"/>
<field name="body" type="dovecottext"
indexed="true" stored="false"/>
<field name="box" type="string" indexed="true"
required="true"
stored="true"/>
<field name="cc" type="dovecotfield"
indexed="true" stored="false"/>
<field name="from" type="dovecotfield"
indexed="true" stored="false"/>
<field name="hdr" type="string" indexed="false"
stored="false"/>
<field name="id" type="string" indexed="true"
required="true"
stored="true"/>
<field name="subject" type="dovecottext"
indexed="true" stored="false"/>
<field name="to" type="dovecotfield"
indexed="true" stored="false"/>
<field name="uid" type="string" indexed="true"
required="true"
stored="true"/>
<field name="user" type="string" indexed="true"
required="true"
stored="true"/>
</schema>
-- DOVECOT.CONF
mail_plugins = fts fts_solr
plugin {
plugin = fts fts_solr managesieve sieve
fts = solr
fts_autoindex = yes
fts_enforced = yes
fts_solr = url=http://127.0.0.1:8983/solr/dovecot/
(replace 127.0.0.1 by your solr server if you want to use an external
server)
(...)
}
-- /ETC/SYSTEMD/SYSTEM/MULTI-USER.TARGET.WANTS/SOLR.SERVICE
[Unit]
Description=Solr full text search engine
After=network.target
[Service]
Type=simple
User=solr
Group=solr
PrivateTmp=yes
WorkingDirectory=/opt/solr
LIMITNOFILE=65000
LIMITNPROC=65000
ExecStart=/opt/solr/bin/solr start -f
[Install]
WantedBy=multi-user.target
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://dovecot.org/pipermail/dovecot/attachments/20190104/ac530342/attachment-0001.html>
Hi
This is the summary of my work with SOLR-Dovecot, in my QUEST TO
REPRODUCE THE PREVIOULSY EXCELLENT WORK OF FTS_SQUAT
@Aki : Based on the time I have spent on this, I would love to see you
updating the Wiki with those improvements, and adding my name somewhere
@All : Hope it helps
- INSTALLATION:
-> Create a clean install using the default, (at least in the Archlinux
package), and do a "sudo -u solr solr create -c dovecot ". The config
files are then in /opt/solr/server/solr/dovecot/conf and datafiles in
/opt/solr/server/solr/dovecot/data
-> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:
* around line 313, change <openSearcher>false</openSearcher> to
<openSearcher>true</openSearcher>
* around line 147, set
<writeLockTimeout>2000</writeLockTimeout>
(or above)
* around line 696 : uncomment <str
name="df">hdr</str>
* around line 1127, before <updateProcessor
class="solr.UUIDUpdateProcessorFactory" name="uuid"/>,
add
<schemaFactory
class="ClassicIndexSchemaFactory"></schemaFactory>
* around line 1161, delete the whole <updateProcessor
class="solr.AddSchemaFieldsUpdateProcessorFactory"
name="add-schema-fields">
* around line 1192, remove the whole <updateRequestProcessorChain
name="add-unknown-fields-to-the-schema" ... />
-> Remove /opt/solr/server/solr/dovecot/conf/managed-schema
-> Change "schema.xml" by the one below to reproduce fts_squat
behavior
(equivalent to " fts_squat = partial=3 full=25" in dovecot.conf) (note
:
such a huge trouble to replace a single line setup, anyway...)
-> Move /opt/solr/server/solr (or the subfolder data) to a partition
with *space*, ideally ext4 or faster file system (it looks like Solr is
not considering using a simple mysql database, which would make sense to
avoid all the fuzz and let it transit to a non-java state, but that is
another story)
-> Config of dovecot.conf is as below
-> The systemd unit shall specify high ulimit for files and proc (see
below)
-> Increase the memory available for the JavaVM (I put 12Gb as I have
quite a space on my server, but you may adapt it as per your specs) : in
/opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m"
-> As Solr is complaining a lot, you may consider a filter for it in
your syslog-ng or journald as it pollutes greatly your audit files
-> (re)Start solr (first) and dovecot by systemctl
-> Launch redindex ( doveadm fts rescan -u <username> )
-> wait for a big while to let the system re-index all your mail boxes
- BUGS SO FAR
-> Line 620 of fts_solr dovecot plugin : the size oof header is
improperly calculated ("huge header" warning for a simple email, which
kilss the index of that considered email, so basically MOST emails as
the calculation is wrong)
-> The UID returned by SOlr is to be considered as a STRING (and that is
maybe the source of problem of the "out of bound" errors in fts_solr
dovecot, as "long" is not enough)
-> Java errors : A lot of non sense for me, I am not expert in Java.
But, with increased memory, it seems not crashing, even if complaining
quite a lot in the logs
-------SCHEMA.XML IN /OPT/SOLR/SERVER/SOLR/DOVECOT/CONF
<?xml version="1.0" encoding="UTF-8"?>
<schema name="dovecot" version="2.0">
<uniqueKey>id</uniqueKey>
<fieldType name="dovecottext" class="solr.TextField"
autoGeneratePhraseQueries="true"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory"
catenateNumbers="1"
generateNumberParts="1" splitOnCaseChange="1"
generateWordParts="1"
splitOnNumerics="1" catenateAll="1"
catenateWords="1"
preserveOriginal="1"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="dovecotfield" class="solr.TextField"
autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3"
maxGramSize="25"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="string" class="solr.StrField"/>
<field name="_version_" type="string"
indexed="true" stored="true"/>
<field name="bcc" type="string" indexed="false"
stored="false"/>
<field name="body" type="dovecottext"
indexed="true" stored="false"/>
<field name="box" type="string" indexed="true"
required="true"
stored="true"/>
<field name="cc" type="dovecotfield"
indexed="true" stored="false"/>
<field name="from" type="dovecotfield"
indexed="true" stored="false"/>
<field name="hdr" type="string" indexed="false"
stored="false"/>
<field name="id" type="string" indexed="true"
required="true"
stored="true"/>
<field name="subject" type="dovecottext"
indexed="true" stored="false"/>
<field name="to" type="dovecotfield"
indexed="true" stored="false"/>
<field name="uid" type="string" indexed="true"
required="true"
stored="true"/>
<field name="user" type="string" indexed="true"
required="true"
stored="true"/>
</schema>
-- DOVECOT.CONF
mail_plugins = fts fts_solr
plugin {
plugin = fts fts_solr managesieve sieve
fts = solr
fts_autoindex = yes
fts_enforced = yes
fts_solr = url=http://127.0.0.1:8983/solr/dovecot/
(replace 127.0.0.1 by your solr server if you want to use an external
server)
(...)
}
-- /ETC/SYSTEMD/SYSTEM/MULTI-USER.TARGET.WANTS/SOLR.SERVICE
[Unit]
Description=Solr full text search engine
After=network.target
[Service]
Type=simple
User=solr
Group=solr
PrivateTmp=yes
WorkingDirectory=/opt/solr
LIMITNOFILE=65000
LIMITNPROC=65000
ExecStart=/opt/solr/bin/solr start -f
[Install]
WantedBy=multi-user.target
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://dovecot.org/pipermail/dovecot/attachments/20190104/d2a25b49/attachment.html>
On 1/3/2019, 3:07:18 PM, Daniel Miller via dovecot <dovecot at dovecot.org> wrote:> On 1/3/2019 10:56 AM, Tanstaafl wrote: >> On 12/21/2018, 11:19:42 AM, Daniel Miller via dovecot >> <dovecot at dovecot.org> wrote: >>> There is a *huge* difference between a functional Solr setup & squat >> Interesting. Care to elaborate? > > This is one of those things that has to be experienced to be > understood.? When you can perform an FTS search across (pause while I > check current stats...): > > du -c -h /var/mail??? ??? 136G > > Solr numDocs:??? ??? 520102 > > and using any IMAP client that supports server-side searches (like > Thunderbird & AquaMail) the results are basically instantaneous...it's > worth the effort.? And that's searching a Dovecot virtual folder defined > as "* all", including all my archives, all my list subscriptions, and > all the shared Inbox/Sent folders from my other users. > > But I certainly wish it was easier to setup.Thanks Daniel... So, as one who has no experience of the benefit of either... How does this compare with Squat? Meaning, Is it exponentially faster? Twice as fast?