If substring_search is true, the indexes can unfortunately be quite large (40%+
of mailbox data size) - this is because Xapian does not natively support
substring searching so we have to hack/fake it by storing redundant data in the
index.
If substring_search is false, storage size is generally < 10% of mailbox
storage size.
There's some (older) benchmarking on this at
https://github.com/slusarz/dovecot-fts-flatcurve#indexing-benchmark-with-substring-matching-enabled-default-configuration
Obviously, this is dependent on the local mix of message data you are indexing.
The amount of attachments, language, the media type of text parts (e.g. plain
vs. html), etc. are all variables that may change storage size.
I don't know how storage compares with Solr. flatcurve and Solr are two
completely different use-cases however, so I'm not sure how useful that
comparison is.
michael
> On 11/03/2021 2:26 PM Marc <marc at f1-outsourcing.eu> wrote:
>
>
>
> Is there some info on what to expect how big these indexes can get (%
mailbox)? Is there any differences between solr / xapian storage use?
>
> https://github.com/slusarz/dovecot-fts-flatcurve/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://dovecot.org/pipermail/dovecot/attachments/20211103/7e41d49f/attachment.html>