Alessio Cecchi
2021-Sep-02 14:26 UTC
Dovecot - FTS Solr: disk usage & position information?
Hi Vincent, thanks for your investigations! Il 01/09/21 11:27, Vincent Brillault ha scritto:> Dear all, > > Just a status update, in case this can help others. > > We went forward and disabled the position information indexing and the > re-indexed of our mail data (over a couple of days to avoid > overloading the systems). Before the re-indexing we had 1.33 TiB in > our Solr Indexes. After re-indexation, we had only 542 GiB, that's a > 60% of our storage requirements for our FTS indexes :)this optimization also produce a less RAM requirements on Solr server?> > So far, we haven't been reported any issue or measurable differences > by our users concerning the quality of the FTS. From further > debugging, as discussed on the solr-user mailing list > (https://lists.apache.org/thread.html/rcdf8bb97be0839e57928ad5fa34501ec8a73392c11248db91206bc33%40%3Cusers.solr.apache.org%3E), > I've come to the conclusion that, with the current integration between > Dovecot and Solr (esp the fact that `"` is escaped), it's impossible > to trigger phrase queries from user queries as long as > autoGeneratePhraseQueries is false. > > I've attached the schema.xml and solrconfig.xml we are now using with > Solr 8.6.0, in case there is any interest from others. Let me know if > you prefer a MR to update the xmls present in > https://github.com/dovecot/core/tree/master/doc.The attached schema and config file also works with Solr 7.7.0? Since dovecot provide schema and config for 7.7.0 will be useful for many of us a path based on it. Thanks -- Alessio Cecchi Postmaster @ http://www.qboxmail.it https://www.linkedin.com/in/alessice -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20210902/901ab4ca/attachment.html>
Vincent Brillault
2021-Sep-06 06:58 UTC
Dovecot - FTS Solr: disk usage & position information?
Hi Alessio,> this optimization also produce a less RAM requirements on Solr server?Unfortunately we didn't measure this before/after the change. Since we are removing features (position information), I wouldn't expect the memory requirement to increase, but I'm no expert. To be honest, I've not been able to measure in any sensible way the memory really required by Solr. The memory directly used by the Solr process is rather limited, but a lot of memory is used for file caches, which also feels (again not an expert) important for good performances.> The attached schema and config file also works with Solr 7.7.0? Since > dovecot provide schema and config for 7.7.0 will be useful for many of > us a path based on it.At least the solrconfig.xml shouldn't work with 7.7.0 since I increased the luceneMatchVersion to match 8.6 and imported a few defaults from the default upstream 8.6 configuration. I think these changes could be ignored for 7.7.0. For schema.xml, I made quite a few changes, but all seem to be backward compatible: - Remove unused 'boolean' field type - Remove KeywordMarkerFilterFactory: protwords are usually empty anyway - User a simper 'text_basic' field type (no StopFilterFactory, SynonymGraphFilterFactory or PorterStemFilterFactory) for processing non-human fields (all but body and subject) - Remplace autoGeneratePhraseQueries & positionIncrementGap by omitTermFreqAndPositions="true" & omitPositions="true" on TextField fieldtypes (as discussed in this thread) - Minor modifications on WordDelimiterGraphFilterFactory when used in search to have better match (things like 'covid19' are indexed as ['covid', '19', 'covid19'] but only searched as 'covid19') From taking a quick look at the documentation, I _think_ most of them are compatible with 7.7.0, but without testing, I can't guarantee it. Cheers, Vincent -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <https://dovecot.org/pipermail/dovecot/attachments/20210906/b5d78feb/attachment.sig>