thr3ads.net - dovecot - Solr - complete setup (update) [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Joan Moreau

2019-Jan-26 19:07 UTC

Solr - complete setup (update)

> *- Installation:*
> 
> -> Create a clean install using the default, (at least in the Archlinux
package), and do a "sudo -u solr solr create -c dovecot ". The config
files are then in /opt/solr/server/solr/dovecot/conf and datafiles in
/opt/solr/server/solr/dovecot/data
On my system (Debian) these directories are wildly different (e.g. data
is under /var), but other than that, this information is OK.

Used this as a side-reference for Debian installation:
https://tecadmin.net/install-apache-solr-on-debian/

Accessed http://solr-host.tld:8983/solr/ to check whether all is OK. 

MAKE SURE YOU HAVE A DOVECOT INSTANCE (NOT THE DEFAULT INSTANCE) , WITH
THE FUNCTION BELOW: 

SOLR CREATE -C DOVECOT (OR WHATEVER NAME) 
> Weirdly, rescan returns immediately here. When I perform `doveadm index
INBOX` for my test user, I do see a lot of fts and HTTP activity.
THE SOLR PLUGIN IS NOT CODED ENTIRELY, REFRESH AND RESCAN FUNCTIONS ARE
MISSING : 

https://github.com/dovecot/core/blob/master/src/plugins/fts-solr/fts-backend-solr.c


static int fts_backend_solr_refresh(struct fts_backend *backend
ATTR_UNUSED)
{
return 0;
} 

static int fts_backend_solr_rescan(struct fts_backend *backend)
{
/* FIXME: proper rescan needed. for now we'll just reset the
last-uids */
return fts_backend_reset_last_uids(backend);
} 
> *- Bugs so far*
> 
> -> Line 620 of fts_solr dovecot plugin : the size oof header is
improperly calculated ("huge header" warning for a simple email, which
kilss the index of that considered email, so basically MOST emails as the
calculation is wrong)
YOU CAN CHECK THAT REGULARLY IN DOVECOT LOG FILE. MY GUESS IS THE MIX OF
UNICODE WHICH IS NOT PROPERLY ADDRESSED HERE. 
> -> The UID returned by SOlr is to be considered as a STRING (and that is
maybe the source of problem of the "out of bound" errors in fts_solr
dovecot, as "long" is not enough)
THIS IS JUST HIGHLY VISIBLE IN SOLR SCHEMA.XML. SWITHCING IT TO "LONG"
IN SCHEMA.XML RETURNS PLENTY OF ERRORS. 
> -> Java errors : A lot of non sense for me, I am not expert in Java.
But, with increased memory, it seems not crashing, even if complaining quite a
lot in the logs
> 
> Can you elaborate on the errors you have seen so far? When do these happen?
How can I reproduce them?
HONESTLY, I HAVE NO CLUE WHAT THE PROBLEMS ARE. I JUST INCREASED THE
MEMORY OF THE JVM AND THE SYSTEMS STOPPED CRASHING. LOG FILES ARE HUGE
ANYWAY.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://dovecot.org/pipermail/dovecot/attachments/20190126/6f6e4237/attachment-0001.html>

Stephan Bosch

2019-Jan-29 23:33 UTC

head link

Solr - complete setup (update)

(forgot to CC mailing list)

Op 26/01/2019 om 20:07 schreef Joan Moreau via dovecot:>>
>>
>> *-?Bugs?so?far*
>>
>> -> Line 620 of fts_solr dovecot plugin : the size oof header is 
>> improperly calculated ("huge header" warning for a simple
email,
>> which kilss the index of that considered email, so basically MOST 
>> emails as the calculation is wrong)
> *You can check that regularly in dovecot log file. My guess is the mix 
> of Unicode which is not properly addressed here.*
Does this happen with specific messages? Do you have a sample message 
for me? I don't see how Unicode could cause this.
>>
>> -> The UID returned by SOlr is to be considered as a STRING (and
that
>> is maybe the source of problem of the "out of bound" errors
in
>> fts_solr dovecot, as "long" is not enough)
> *This is just highly visible in Solr schema.xml. Swithcing it to 
> "long" in schema.xml returns plenty of errors.*
I cannot reproduce this so far (see modified schema below). In a simple 
test I just get the desired results and no errors logged.
>>
>> -> Java errors : A lot of non sense for me, I am not expert in Java.
>> But, with increased memory, it seems not crashing, even if 
>> complaining quite a lot in the logs
>>
>> Can you elaborate on the errors you have seen so far? When do these 
>> happen? How can I reproduce them?
>>
> *Honestly, I have no clue what the problems are. I just increased the 
> memory of the JVM and the systems stopped crashing. Log files are huge 
> anyway.*
What errors do you see? I see only INFO entries in my 
/var/solr/logs/solr.log. Looks like Solr is pretty verbose by default 
(lots of INFO output), but there must be a way to reduce that.

Regards,

Stephan.


<?xml version="1.0" encoding="UTF-8"?>
<schema name="dovecot" version="2.0">
<uniqueKey>id</uniqueKey>
<fieldType name="long" class="solr.LongPointField" 
positionIncrementGap="0"/>
<fieldType name="dovecottext" class="solr.TextField" 
autoGeneratePhraseQueries="true"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory"
catenateNumbers="1"
generateNumberParts="1" splitOnCaseChange="1"
generateWordParts="1"
splitOnNumerics="1" catenateAll="1"
catenateWords="1" preserveOriginal="1"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="dovecotfield" class="solr.TextField" 
autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3"
maxGramSize="25"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

<fieldType name="string" class="solr.StrField"/>
<field name="_version_" type="string"
indexed="true" stored="true"/>
<field name="bcc" type="string" indexed="false"
stored="false"/>
<field name="body" type="dovecottext"
indexed="true" stored="false"/>
<field name="box" type="string" indexed="true"
required="true"
stored="true"/>
<field name="cc" type="dovecotfield"
indexed="true" stored="false"/>
<field name="from" type="dovecotfield"
indexed="true" stored="false"/>
<field name="hdr" type="string" indexed="false"
stored="false"/>
<field name="id" type="string" indexed="true"
required="true"
stored="true"/>
<field name="subject" type="dovecottext"
indexed="true" stored="false"/>
<field name="to" type="dovecotfield"
indexed="true" stored="false"/>
<field name="uid" type="long" indexed="true"
required="true" stored="true"/>
<field name="user" type="string" indexed="true"
required="true"
stored="true"/>
</schema>

Joan Moreau

2019-Jan-30 05:58 UTC

head link

Solr - complete setup (update)

On 2019-01-30 07:33, Stephan Bosch wrote:
> (forgot to CC mailing list)
> 
> Op 26/01/2019 om 20:07 schreef Joan Moreau via dovecot: 
> 
> *- Bugs so far*
> 
> -> Line 620 of fts_solr dovecot plugin : the size oof header is
improperly calculated ("huge header" warning for a simple email, which
kilss the index of that considered email, so basically MOST emails as the
calculation is wrong) *You can check that regularly in dovecot log file. My
guess is the mix of Unicode which is not properly addressed here.*
Does this happen with specific messages? Do you have a sample message
for me? I don't see how Unicode could cause this. 

MY ONLY GUESS IS THAT IT REFERS TO SOME 'STRLEN', WHICH IS WRONG OF
COURSE IN CASE OF UNICODE EMAILS. THIS IS JUST A GUESS. 

BUT DO A GREP FOR "HUGE" IN THE DOVECOT LOG OF A BUSY SERVER TO FIND
EXAMPLES. 

(SORRY, I SWITCHED TO XAPIAN, AS SOLR IS CREATING TOO MUCH TROUBLES FOR
MY SERVER, SO NO MORE CONCRETE EXAMPLE) 
>> -> The UID returned by SOlr is to be considered as a STRING (and
that is maybe the source of problem of the "out of bound" errors in
fts_solr dovecot, as "long" is not enough)
> *This is just highly visible in Solr schema.xml. Swithcing it to
"long" in schema.xml returns plenty of errors.*
I cannot reproduce this so far (see modified schema below). In a simple
test I just get the desired results and no errors logged. 

I got this with large mailboxes (where UID seems not acceptable for Solr
). The fault is not on Dovecot side but Solr, and the returned UID(s)
for a search is garbage instead of a proper value -> Putting it as
string solves this
>> -> Java errors : A lot of non sense for me, I am not expert in Java.
But, with increased memory, it seems not crashing, even if complaining quite a
lot in the logs
>> 
>> Can you elaborate on the errors you have seen so far? When do these
happen? How can I reproduce them?
> *Honestly, I have no clue what the problems are. I just increased the
memory of the JVM and the systems stopped crashing. Log files are huge anyway.*
What errors do you see? I see only INFO entries in my
/var/solr/logs/solr.log. Looks like Solr is pretty verbose by default
(lots of INFO output), but there must be a way to reduce that. 

I DELETED SOLR. NO MORE LOGS. MAYBE SOMEONE ELSE CAN TELL. 

<?xml version="1.0" encoding="UTF-8"?>
<schema name="dovecot" version="2.0">
<uniqueKey>id</uniqueKey>
<fieldType name="long" class="solr.LongPointField"
positionIncrementGap="0"/>
<fieldType name="dovecottext" class="solr.TextField"
autoGeneratePhraseQueries="true"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory"
catenateNumbers="1"
generateNumberParts="1" splitOnCaseChange="1"
generateWordParts="1"
splitOnNumerics="1" catenateAll="1"
catenateWords="1"
preserveOriginal="1"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="dovecotfield" class="solr.TextField"
autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3"
maxGramSize="25"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

<fieldType name="string" class="solr.StrField"/>
<field name="_version_" type="string"
indexed="true" stored="true"/>
<field name="bcc" type="string" indexed="false"
stored="false"/>
<field name="body" type="dovecottext"
indexed="true" stored="false"/>
<field name="box" type="string" indexed="true"
required="true"
stored="true"/>
<field name="cc" type="dovecotfield"
indexed="true" stored="false"/>
<field name="from" type="dovecotfield"
indexed="true" stored="false"/>
<field name="hdr" type="string" indexed="false"
stored="false"/>
<field name="id" type="string" indexed="true"
required="true"
stored="true"/>
<field name="subject" type="dovecottext"
indexed="true" stored="false"/>
<field name="to" type="dovecotfield"
indexed="true" stored="false"/>
<field name="uid" type="long" indexed="true"
required="true"
stored="true"/>
<field name="user" type="string" indexed="true"
required="true"
stored="true"/>
</schema>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://dovecot.org/pipermail/dovecot/attachments/20190130/8d134bd5/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

dovecot - Jan 2019 - Solr - complete setup (update)

Solr - complete setup (update)

Solr - complete setup (update)

Solr - complete setup (update)

Maybe Matching Threads