A starting point would be to have a look at the current FTS plugins: https://github.com/dovecot/core/tree/master/src/plugins/fts-solrandhttps://github.com/dovecot/core/tree/master/src/plugins/fts-squat -M Am Freitag, den 04.01.2019, 18:17 +0800 schrieb Joan Moreau via dovecot:> Why not, but please guide me about the core structure (mandatory > funcitons, etc..) of a typical Dovecot FTS plugin > > > > > > > > On 2019-01-04 17:20, Aki Tuomi wrote: > > I hope you are aware that "linking with Xapian" requires somewhat > > more work than just -lxapian in linker? If you or someone feels > > like writing fts_xapian, go for it. > > > > Aki > > > > > > > On 04 January 2019 at 08:20 Joan Moreau via dovecot < > > > dovecot at dovecot.org> wrote: > > > > > > > > > What about consedering linking Dovecot with Xapian librairies > > > instead of > > > going to nightmare Solr ? > > > > > > https://xapian.org/features > > > > > > On 2019-01-02 17:10, John Tulp wrote: > > > > > > > > > > On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main > > > > problem is : After some time of indexing from Dovecot, Dovecot > > > > returns errors (invalid SID, etc...) and Solr return "out of > > > > range > > > > indexes" errors > > > > I've been watching the progress of this thread with no small > > > > concern, mainly > > > > because I've been tasked with providing a server-side email > > > > search facility > > > > with a budget and manpower level that comes down to mainly *1*, > > > > i.e., me. > > > > > > > > I was expecting, given the strongly worded language about "just > > > > use > > > > lucene/SOLR" and "ignore squat", that I should invest time + > > > > effort into this > > > > JAVA nightmare that is SOLR. > > > > > > > > I started with squat and another word-indexor system that used > > > > out-of-band > > > > (not a dovecot plugin) software to provide rapid (sub-second) > > > > searches through > > > > tens-of-GB-scale mailboxes. > > > > > > > > Unlike what I was led to believe, the squat indexes worked > > > > surprisingly well, > > > > once you sorted out the odd resource size (ulimit-related) > > > > issues (vsz & > > > > friends) limitations. I did notice the "worst-case" search > > > > performance have > > > > worryingly high O(x) increases in time, but I'd not seen > > > > anything that was a > > > > dealbreaker. It goes without saying that various substring > > > > searches worked as > > > > expected, for the most part. > > > > > > > > My experiences with SOLR were similar to Messr. Moreau's: lots > > > > of startup > > > > errors with provided schemata files. Lots of JAVA nonsense > > > > issues. Lots of > > > > sensitivity to WHICH Java runtime, etc, etc. I finally fixated > > > > a specific JVM, > > > > version of SOLR, and dovecot to find the "best" working > > > > combination, only to > > > > find that the searches didn't work out as expected. I expected > > > > to be able to > > > > do date-ranging based searches. Didn't work. I expected to > > > > search CONTENTS of > > > > emails, and despite many days of tweaks, I couldn't get it to > > > > index even the > > > > basics like filenames/types of attachments, so I could exposed > > > > attachment-based searching to my users. > > > > > > > > So, without rancour or antipathy, I ask the entire list: has > > > > ANYONE gotten a > > > > Dovecot/solr-fts-plugin setup to work that provides as a > > > > BASELINE, all of the > > > > following functionality: > > > > > > > > 1) The ability to search for a string within any of the > > > > structured fields > > > > (from/subject) that returns correct results? > > > > > > > > 2) The ability to search for any string within the BODY of > > > > emails, including > > > > the MIME attachment boundaries? > > > > > > > > 3) The ability to do "ranging" searches for structures within > > > > emails that > > > > decompose to "dates" or other simple-numeric data? > > > > > > > > OPTIONALLY, and this is probably way outside of the scope of > > > > the above, > > > > despite the fact that it's listed as a "selling point" of SOLR > > > > versus other > > > > full text search engines: > > > > > > > > 4) The ability to do searches against any attachments that are > > > > able to be > > > > post-processed and hyper-indexed by SOLR+Tika? > > > > > > > > ------------- > > > > > > > > SOLR seems to have "brand cachet", so presumably it actually > > > > works (for somebody). > > > > > > > > Dovecot has not a little "brand cachet", and for me, I have > > > > innate faith and > > > > trust in Timo and his software. I am no stranger to the "costs" > > > > of "free" > > > > software, in that you sacrifice your own blood, sweat, and > > > > tears just to get > > > > these disparate pieces to work together. > > > > > > > > I *DO* respect that Timo has to keep the lights (and sauna) on > > > > in Finland. > > > > Maybe there's a super-secret (no advertised prices, "carrier- > > > > only" price list) > > > > with _Dovecot, Oy_ wherein the above ARE actually available for > > > > something less > > > > than 6.022 x 10^23 Euros per centi-second of licencing fees. > > > > > > > > But please, level with us faithful users. Does this morass of > > > > Java B.S. > > > > actually work, and if not, please just deprecate and remove > > > > this moribund > > > > software, and stop trying to bury the only FTS plugin many of > > > > us HAVE actually > > > > gotten to work. (Pretty please?) > > > > > > > > I respect that Messr. Moreau has made an earnest effort to get > > > > this JAVA B.S. > > > > to actually work, as I have. > > > > > > > > He persevered where I'd given up. He's vocal about it, and now > > > > I'm chiming in > > > > that this ornate collection of switchblades only cuts those who > > > > try to use them. > > > > > > > > Respectfully, > > > > =M> > > > > > Fascinating... > > > > > > SOLR says the following are powered by SOLR... > > > > > > https://wiki.apache.org/solr/PublicServers > > > > > > Perhaps if you could find out from that list which of them are > > > using > > > SOLR in conjunction with Dovecot... > > > > > > food for thought...-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190104/ef9741d5/attachment-0001.html>
Yes but: 1 - is there a documentation of the main object ? (fts_backend, mail_user, mailbox, etc..) 2 - What are the mandatory functions ? 3 - Search : Supposedly, the FTS shall have several parameters : the keyword(s), the user & mailbox, and the fields (to, from, body, etc..) to be includude in the search. What is the function called in the plugin ? 4 - Indexing : Somehow, what is the logic ? fts core just ask to "index me this email of this mailbox" ? or this is delegated to the plugin to sort out which emails it has indexed yet or not ? Thank you On 2019-01-04 18:49, admin wrote:> A starting point would be to have a look at the current FTS plugins: > > https://github.com/dovecot/core/tree/master/src/plugins/fts-solr > and > https://github.com/dovecot/core/tree/master/src/plugins/fts-squat > > -M > > Am Freitag, den 04.01.2019, 18:17 +0800 schrieb Joan Moreau via dovecot: > > Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin > > On 2019-01-04 17:20, Aki Tuomi wrote: > I hope you are aware that "linking with Xapian" requires somewhat more work than just -lxapian in linker? If you or someone feels like writing fts_xapian, go for it. > > Aki > > On 04 January 2019 at 08:20 Joan Moreau via dovecot <dovecot at dovecot.org> wrote: > > What about consedering linking Dovecot with Xapian librairies instead of > going to nightmare Solr ? > > https://xapian.org/features > > On 2019-01-02 17:10, John Tulp wrote: > > On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : After some time of indexing from Dovecot, Dovecot > returns errors (invalid SID, etc...) and Solr return "out of range > indexes" errors > I've been watching the progress of this thread with no small concern, mainly > because I've been tasked with providing a server-side email search facility > with a budget and manpower level that comes down to mainly *1*, i.e., me. > > I was expecting, given the strongly worded language about "just use > lucene/SOLR" and "ignore squat", that I should invest time + effort into this > JAVA nightmare that is SOLR. > > I started with squat and another word-indexor system that used out-of-band > (not a dovecot plugin) software to provide rapid (sub-second) searches through > tens-of-GB-scale mailboxes. > > Unlike what I was led to believe, the squat indexes worked surprisingly well, > once you sorted out the odd resource size (ulimit-related) issues (vsz & > friends) limitations. I did notice the "worst-case" search performance have > worryingly high O(x) increases in time, but I'd not seen anything that was a > dealbreaker. It goes without saying that various substring searches worked as > expected, for the most part. > > My experiences with SOLR were similar to Messr. Moreau's: lots of startup > errors with provided schemata files. Lots of JAVA nonsense issues. Lots of > sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM, > version of SOLR, and dovecot to find the "best" working combination, only to > find that the searches didn't work out as expected. I expected to be able to > do date-ranging based searches. Didn't work. I expected to search CONTENTS of > emails, and despite many days of tweaks, I couldn't get it to index even the > basics like filenames/types of attachments, so I could exposed > attachment-based searching to my users. > > So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a > Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the > following functionality: > > 1) The ability to search for a string within any of the structured fields > (from/subject) that returns correct results? > > 2) The ability to search for any string within the BODY of emails, including > the MIME attachment boundaries? > > 3) The ability to do "ranging" searches for structures within emails that > decompose to "dates" or other simple-numeric data? > > OPTIONALLY, and this is probably way outside of the scope of the above, > despite the fact that it's listed as a "selling point" of SOLR versus other > full text search engines: > > 4) The ability to do searches against any attachments that are able to be > post-processed and hyper-indexed by SOLR+Tika? > > ------------- > > SOLR seems to have "brand cachet", so presumably it actually works (for somebody). > > Dovecot has not a little "brand cachet", and for me, I have innate faith and > trust in Timo and his software. I am no stranger to the "costs" of "free" > software, in that you sacrifice your own blood, sweat, and tears just to get > these disparate pieces to work together. > > I *DO* respect that Timo has to keep the lights (and sauna) on in Finland. > Maybe there's a super-secret (no advertised prices, "carrier-only" price list) > with _Dovecot, Oy_ wherein the above ARE actually available for something less > than 6.022 x 10^23 Euros per centi-second of licencing fees. > > But please, level with us faithful users. Does this morass of Java B.S. > actually work, and if not, please just deprecate and remove this moribund > software, and stop trying to bury the only FTS plugin many of us HAVE actually > gotten to work. (Pretty please?) > > I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S. > to actually work, as I have. > > He persevered where I'd given up. He's vocal about it, and now I'm chiming in > that this ornate collection of switchblades only cuts those who try to use them. > > Respectfully, > =M= Fascinating... > > SOLR says the following are powered by SOLR... > > https://wiki.apache.org/solr/PublicServers > > Perhaps if you could find out from that list which of them are using > SOLR in conjunction with Dovecot... > > food for thought...-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190104/53aee201/attachment.html>
Also, a description of the "to be" functions of the backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT, -> WHAT OTHER FLAGS ? { fts_backend_xapian_alloc, fts_backend_xapian_init, fts_backend_xapian_deinit, fts_backend_xapian_get_last_uid, fts_backend_xapian_update_init, fts_backend_xapian_update_deinit, fts_backend_xapian_update_set_mailbox, fts_backend_xapian_update_expunge, fts_backend_xapian_update_set_build_key, fts_backend_xapian_update_unset_build_key, fts_backend_xapian_update_build_more, fts_backend_xapian_refresh, fts_backend_xapian_rescan, fts_backend_xapian_optimize, fts_backend_default_can_lookup, fts_backend_xapian_lookup, fts_backend_xapian_lookup_multi, fts_backend_xapian_lookup_done } }; On 2019-01-04 20:33, Joan Moreau via dovecot wrote:> Yes but: > > 1 - is there a documentation of the main object ? (fts_backend, mail_user, mailbox, etc..) > > 2 - What are the mandatory functions ? > > 3 - Search : Supposedly, the FTS shall have several parameters : the keyword(s), the user & mailbox, and the fields (to, from, body, etc..) to be includude in the search. What is the function called in the plugin ? > > 4 - Indexing : Somehow, what is the logic ? fts core just ask to "index me this email of this mailbox" ? or this is delegated to the plugin to sort out which emails it has indexed yet or not ? > > Thank you > > On 2019-01-04 18:49, admin wrote: > A starting point would be to have a look at the current FTS plugins: > > https://github.com/dovecot/core/tree/master/src/plugins/fts-solr > and > https://github.com/dovecot/core/tree/master/src/plugins/fts-squat > > -M > > Am Freitag, den 04.01.2019, 18:17 +0800 schrieb Joan Moreau via dovecot: > > Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin > > On 2019-01-04 17:20, Aki Tuomi wrote: > I hope you are aware that "linking with Xapian" requires somewhat more work than just -lxapian in linker? If you or someone feels like writing fts_xapian, go for it. > > Aki > > On 04 January 2019 at 08:20 Joan Moreau via dovecot <dovecot at dovecot.org> wrote: > > What about consedering linking Dovecot with Xapian librairies instead of > going to nightmare Solr ? > > https://xapian.org/features > > On 2019-01-02 17:10, John Tulp wrote: > > On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : After some time of indexing from Dovecot, Dovecot > returns errors (invalid SID, etc...) and Solr return "out of range > indexes" errors > I've been watching the progress of this thread with no small concern, mainly > because I've been tasked with providing a server-side email search facility > with a budget and manpower level that comes down to mainly *1*, i.e., me. > > I was expecting, given the strongly worded language about "just use > lucene/SOLR" and "ignore squat", that I should invest time + effort into this > JAVA nightmare that is SOLR. > > I started with squat and another word-indexor system that used out-of-band > (not a dovecot plugin) software to provide rapid (sub-second) searches through > tens-of-GB-scale mailboxes. > > Unlike what I was led to believe, the squat indexes worked surprisingly well, > once you sorted out the odd resource size (ulimit-related) issues (vsz & > friends) limitations. I did notice the "worst-case" search performance have > worryingly high O(x) increases in time, but I'd not seen anything that was a > dealbreaker. It goes without saying that various substring searches worked as > expected, for the most part. > > My experiences with SOLR were similar to Messr. Moreau's: lots of startup > errors with provided schemata files. Lots of JAVA nonsense issues. Lots of > sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM, > version of SOLR, and dovecot to find the "best" working combination, only to > find that the searches didn't work out as expected. I expected to be able to > do date-ranging based searches. Didn't work. I expected to search CONTENTS of > emails, and despite many days of tweaks, I couldn't get it to index even the > basics like filenames/types of attachments, so I could exposed > attachment-based searching to my users. > > So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a > Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the > following functionality: > > 1) The ability to search for a string within any of the structured fields > (from/subject) that returns correct results? > > 2) The ability to search for any string within the BODY of emails, including > the MIME attachment boundaries? > > 3) The ability to do "ranging" searches for structures within emails that > decompose to "dates" or other simple-numeric data? > > OPTIONALLY, and this is probably way outside of the scope of the above, > despite the fact that it's listed as a "selling point" of SOLR versus other > full text search engines: > > 4) The ability to do searches against any attachments that are able to be > post-processed and hyper-indexed by SOLR+Tika? > > ------------- > > SOLR seems to have "brand cachet", so presumably it actually works (for somebody). > > Dovecot has not a little "brand cachet", and for me, I have innate faith and > trust in Timo and his software. I am no stranger to the "costs" of "free" > software, in that you sacrifice your own blood, sweat, and tears just to get > these disparate pieces to work together. > > I *DO* respect that Timo has to keep the lights (and sauna) on in Finland. > Maybe there's a super-secret (no advertised prices, "carrier-only" price list) > with _Dovecot, Oy_ wherein the above ARE actually available for something less > than 6.022 x 10^23 Euros per centi-second of licencing fees. > > But please, level with us faithful users. Does this morass of Java B.S. > actually work, and if not, please just deprecate and remove this moribund > software, and stop trying to bury the only FTS plugin many of us HAVE actually > gotten to work. (Pretty please?) > > I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S. > to actually work, as I have. > > He persevered where I'd given up. He's vocal about it, and now I'm chiming in > that this ornate collection of switchblades only cuts those who try to use them. > > Respectfully, > =M= Fascinating... > > SOLR says the following are powered by SOLR... > > https://wiki.apache.org/solr/PublicServers > > Perhaps if you could find out from that list which of them are using > SOLR in conjunction with Dovecot... > > food for thought...-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190104/4c1d755c/attachment-0001.html>