I hope you are aware that "linking with Xapian" requires somewhat more work than just -lxapian in linker? If you or someone feels like writing fts_xapian, go for it. Aki> On 04 January 2019 at 08:20 Joan Moreau via dovecot <dovecot at dovecot.org> wrote: > > > What about consedering linking Dovecot with Xapian librairies instead of > going to nightmare Solr ? > > https://xapian.org/features > > On 2019-01-02 17:10, John Tulp wrote: > > > On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : After some time of indexing from Dovecot, Dovecot > > returns errors (invalid SID, etc...) and Solr return "out of range > > indexes" errors > > I've been watching the progress of this thread with no small concern, mainly > > because I've been tasked with providing a server-side email search facility > > with a budget and manpower level that comes down to mainly *1*, i.e., me. > > > > I was expecting, given the strongly worded language about "just use > > lucene/SOLR" and "ignore squat", that I should invest time + effort into this > > JAVA nightmare that is SOLR. > > > > I started with squat and another word-indexor system that used out-of-band > > (not a dovecot plugin) software to provide rapid (sub-second) searches through > > tens-of-GB-scale mailboxes. > > > > Unlike what I was led to believe, the squat indexes worked surprisingly well, > > once you sorted out the odd resource size (ulimit-related) issues (vsz & > > friends) limitations. I did notice the "worst-case" search performance have > > worryingly high O(x) increases in time, but I'd not seen anything that was a > > dealbreaker. It goes without saying that various substring searches worked as > > expected, for the most part. > > > > My experiences with SOLR were similar to Messr. Moreau's: lots of startup > > errors with provided schemata files. Lots of JAVA nonsense issues. Lots of > > sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM, > > version of SOLR, and dovecot to find the "best" working combination, only to > > find that the searches didn't work out as expected. I expected to be able to > > do date-ranging based searches. Didn't work. I expected to search CONTENTS of > > emails, and despite many days of tweaks, I couldn't get it to index even the > > basics like filenames/types of attachments, so I could exposed > > attachment-based searching to my users. > > > > So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a > > Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the > > following functionality: > > > > 1) The ability to search for a string within any of the structured fields > > (from/subject) that returns correct results? > > > > 2) The ability to search for any string within the BODY of emails, including > > the MIME attachment boundaries? > > > > 3) The ability to do "ranging" searches for structures within emails that > > decompose to "dates" or other simple-numeric data? > > > > OPTIONALLY, and this is probably way outside of the scope of the above, > > despite the fact that it's listed as a "selling point" of SOLR versus other > > full text search engines: > > > > 4) The ability to do searches against any attachments that are able to be > > post-processed and hyper-indexed by SOLR+Tika? > > > > ------------- > > > > SOLR seems to have "brand cachet", so presumably it actually works (for somebody). > > > > Dovecot has not a little "brand cachet", and for me, I have innate faith and > > trust in Timo and his software. I am no stranger to the "costs" of "free" > > software, in that you sacrifice your own blood, sweat, and tears just to get > > these disparate pieces to work together. > > > > I *DO* respect that Timo has to keep the lights (and sauna) on in Finland. > > Maybe there's a super-secret (no advertised prices, "carrier-only" price list) > > with _Dovecot, Oy_ wherein the above ARE actually available for something less > > than 6.022 x 10^23 Euros per centi-second of licencing fees. > > > > But please, level with us faithful users. Does this morass of Java B.S. > > actually work, and if not, please just deprecate and remove this moribund > > software, and stop trying to bury the only FTS plugin many of us HAVE actually > > gotten to work. (Pretty please?) > > > > I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S. > > to actually work, as I have. > > > > He persevered where I'd given up. He's vocal about it, and now I'm chiming in > > that this ornate collection of switchblades only cuts those who try to use them. > > > > Respectfully, > > =M> Fascinating... > > SOLR says the following are powered by SOLR... > > https://wiki.apache.org/solr/PublicServers > > Perhaps if you could find out from that list which of them are using > SOLR in conjunction with Dovecot... > > food for thought...
Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin On 2019-01-04 17:20, Aki Tuomi wrote:> I hope you are aware that "linking with Xapian" requires somewhat more work than just -lxapian in linker? If you or someone feels like writing fts_xapian, go for it. > > Aki > > On 04 January 2019 at 08:20 Joan Moreau via dovecot <dovecot at dovecot.org> wrote: > > What about consedering linking Dovecot with Xapian librairies instead of > going to nightmare Solr ? > > https://xapian.org/features > > On 2019-01-02 17:10, John Tulp wrote: > > On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : After some time of indexing from Dovecot, Dovecot > returns errors (invalid SID, etc...) and Solr return "out of range > indexes" errors > I've been watching the progress of this thread with no small concern, mainly > because I've been tasked with providing a server-side email search facility > with a budget and manpower level that comes down to mainly *1*, i.e., me. > > I was expecting, given the strongly worded language about "just use > lucene/SOLR" and "ignore squat", that I should invest time + effort into this > JAVA nightmare that is SOLR. > > I started with squat and another word-indexor system that used out-of-band > (not a dovecot plugin) software to provide rapid (sub-second) searches through > tens-of-GB-scale mailboxes. > > Unlike what I was led to believe, the squat indexes worked surprisingly well, > once you sorted out the odd resource size (ulimit-related) issues (vsz & > friends) limitations. I did notice the "worst-case" search performance have > worryingly high O(x) increases in time, but I'd not seen anything that was a > dealbreaker. It goes without saying that various substring searches worked as > expected, for the most part. > > My experiences with SOLR were similar to Messr. Moreau's: lots of startup > errors with provided schemata files. Lots of JAVA nonsense issues. Lots of > sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM, > version of SOLR, and dovecot to find the "best" working combination, only to > find that the searches didn't work out as expected. I expected to be able to > do date-ranging based searches. Didn't work. I expected to search CONTENTS of > emails, and despite many days of tweaks, I couldn't get it to index even the > basics like filenames/types of attachments, so I could exposed > attachment-based searching to my users. > > So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a > Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the > following functionality: > > 1) The ability to search for a string within any of the structured fields > (from/subject) that returns correct results? > > 2) The ability to search for any string within the BODY of emails, including > the MIME attachment boundaries? > > 3) The ability to do "ranging" searches for structures within emails that > decompose to "dates" or other simple-numeric data? > > OPTIONALLY, and this is probably way outside of the scope of the above, > despite the fact that it's listed as a "selling point" of SOLR versus other > full text search engines: > > 4) The ability to do searches against any attachments that are able to be > post-processed and hyper-indexed by SOLR+Tika? > > ------------- > > SOLR seems to have "brand cachet", so presumably it actually works (for somebody). > > Dovecot has not a little "brand cachet", and for me, I have innate faith and > trust in Timo and his software. I am no stranger to the "costs" of "free" > software, in that you sacrifice your own blood, sweat, and tears just to get > these disparate pieces to work together. > > I *DO* respect that Timo has to keep the lights (and sauna) on in Finland. > Maybe there's a super-secret (no advertised prices, "carrier-only" price list) > with _Dovecot, Oy_ wherein the above ARE actually available for something less > than 6.022 x 10^23 Euros per centi-second of licencing fees. > > But please, level with us faithful users. Does this morass of Java B.S. > actually work, and if not, please just deprecate and remove this moribund > software, and stop trying to bury the only FTS plugin many of us HAVE actually > gotten to work. (Pretty please?) > > I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S. > to actually work, as I have. > > He persevered where I'd given up. He's vocal about it, and now I'm chiming in > that this ornate collection of switchblades only cuts those who try to use them. > > Respectfully, > =M= Fascinating... > > SOLR says the following are powered by SOLR... > > https://wiki.apache.org/solr/PublicServers > > Perhaps if you could find out from that list which of them are using > SOLR in conjunction with Dovecot... > > food for thought...-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190104/be4b2adb/attachment.html>
A starting point would be to have a look at the current FTS plugins: https://github.com/dovecot/core/tree/master/src/plugins/fts-solrandhttps://github.com/dovecot/core/tree/master/src/plugins/fts-squat -M Am Freitag, den 04.01.2019, 18:17 +0800 schrieb Joan Moreau via dovecot:> Why not, but please guide me about the core structure (mandatory > funcitons, etc..) of a typical Dovecot FTS plugin > > > > > > > > On 2019-01-04 17:20, Aki Tuomi wrote: > > I hope you are aware that "linking with Xapian" requires somewhat > > more work than just -lxapian in linker? If you or someone feels > > like writing fts_xapian, go for it. > > > > Aki > > > > > > > On 04 January 2019 at 08:20 Joan Moreau via dovecot < > > > dovecot at dovecot.org> wrote: > > > > > > > > > What about consedering linking Dovecot with Xapian librairies > > > instead of > > > going to nightmare Solr ? > > > > > > https://xapian.org/features > > > > > > On 2019-01-02 17:10, John Tulp wrote: > > > > > > > > > > On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main > > > > problem is : After some time of indexing from Dovecot, Dovecot > > > > returns errors (invalid SID, etc...) and Solr return "out of > > > > range > > > > indexes" errors > > > > I've been watching the progress of this thread with no small > > > > concern, mainly > > > > because I've been tasked with providing a server-side email > > > > search facility > > > > with a budget and manpower level that comes down to mainly *1*, > > > > i.e., me. > > > > > > > > I was expecting, given the strongly worded language about "just > > > > use > > > > lucene/SOLR" and "ignore squat", that I should invest time + > > > > effort into this > > > > JAVA nightmare that is SOLR. > > > > > > > > I started with squat and another word-indexor system that used > > > > out-of-band > > > > (not a dovecot plugin) software to provide rapid (sub-second) > > > > searches through > > > > tens-of-GB-scale mailboxes. > > > > > > > > Unlike what I was led to believe, the squat indexes worked > > > > surprisingly well, > > > > once you sorted out the odd resource size (ulimit-related) > > > > issues (vsz & > > > > friends) limitations. I did notice the "worst-case" search > > > > performance have > > > > worryingly high O(x) increases in time, but I'd not seen > > > > anything that was a > > > > dealbreaker. It goes without saying that various substring > > > > searches worked as > > > > expected, for the most part. > > > > > > > > My experiences with SOLR were similar to Messr. Moreau's: lots > > > > of startup > > > > errors with provided schemata files. Lots of JAVA nonsense > > > > issues. Lots of > > > > sensitivity to WHICH Java runtime, etc, etc. I finally fixated > > > > a specific JVM, > > > > version of SOLR, and dovecot to find the "best" working > > > > combination, only to > > > > find that the searches didn't work out as expected. I expected > > > > to be able to > > > > do date-ranging based searches. Didn't work. I expected to > > > > search CONTENTS of > > > > emails, and despite many days of tweaks, I couldn't get it to > > > > index even the > > > > basics like filenames/types of attachments, so I could exposed > > > > attachment-based searching to my users. > > > > > > > > So, without rancour or antipathy, I ask the entire list: has > > > > ANYONE gotten a > > > > Dovecot/solr-fts-plugin setup to work that provides as a > > > > BASELINE, all of the > > > > following functionality: > > > > > > > > 1) The ability to search for a string within any of the > > > > structured fields > > > > (from/subject) that returns correct results? > > > > > > > > 2) The ability to search for any string within the BODY of > > > > emails, including > > > > the MIME attachment boundaries? > > > > > > > > 3) The ability to do "ranging" searches for structures within > > > > emails that > > > > decompose to "dates" or other simple-numeric data? > > > > > > > > OPTIONALLY, and this is probably way outside of the scope of > > > > the above, > > > > despite the fact that it's listed as a "selling point" of SOLR > > > > versus other > > > > full text search engines: > > > > > > > > 4) The ability to do searches against any attachments that are > > > > able to be > > > > post-processed and hyper-indexed by SOLR+Tika? > > > > > > > > ------------- > > > > > > > > SOLR seems to have "brand cachet", so presumably it actually > > > > works (for somebody). > > > > > > > > Dovecot has not a little "brand cachet", and for me, I have > > > > innate faith and > > > > trust in Timo and his software. I am no stranger to the "costs" > > > > of "free" > > > > software, in that you sacrifice your own blood, sweat, and > > > > tears just to get > > > > these disparate pieces to work together. > > > > > > > > I *DO* respect that Timo has to keep the lights (and sauna) on > > > > in Finland. > > > > Maybe there's a super-secret (no advertised prices, "carrier- > > > > only" price list) > > > > with _Dovecot, Oy_ wherein the above ARE actually available for > > > > something less > > > > than 6.022 x 10^23 Euros per centi-second of licencing fees. > > > > > > > > But please, level with us faithful users. Does this morass of > > > > Java B.S. > > > > actually work, and if not, please just deprecate and remove > > > > this moribund > > > > software, and stop trying to bury the only FTS plugin many of > > > > us HAVE actually > > > > gotten to work. (Pretty please?) > > > > > > > > I respect that Messr. Moreau has made an earnest effort to get > > > > this JAVA B.S. > > > > to actually work, as I have. > > > > > > > > He persevered where I'd given up. He's vocal about it, and now > > > > I'm chiming in > > > > that this ornate collection of switchblades only cuts those who > > > > try to use them. > > > > > > > > Respectfully, > > > > =M> > > > > > Fascinating... > > > > > > SOLR says the following are powered by SOLR... > > > > > > https://wiki.apache.org/solr/PublicServers > > > > > > Perhaps if you could find out from that list which of them are > > > using > > > SOLR in conjunction with Dovecot... > > > > > > food for thought...-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190104/ef9741d5/attachment-0001.html>
Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot:> > Why not, but please guide me about the core structure (mandatory > funcitons, etc..) of a typical Dovecot FTS plugin >The Dovecot API documentation is not exhaustive everywhere, but the basics are documented. The remaining questions can be answered by looking at examples found in similar plugins or the relevant API sources. I know of one FTS plugin not written by Dovecot developers: https://github.com/atkinsj/fts-elasticsearch If you really wish to do something like this, just go ahead. It will not be a small effort though. As soon as you have concrete questions, we can help you (don't expect rapid responses though). Regards, Stephan.