The real main differecne seems coming from "diffconfig.xml" When I put yours, Solr delete (!) schema.xml and create a "manage-schema" and starts complaining about useless types (tdates, booleans, etc..) that are not needed for Mail fileds When I put mine (from standard distribution of Arch), it keeps things as they are (yeah !), does not complains about those useless types and startup properly. I attach my diffconfig But these are the configurations that one should adjust as per his/her own use. The main problem is : After some time of indexing from Dovecot, Dovecot returns errors (invalid SID, etc...) and Solr return "out of range indexes" errors On 2019-01-02 07:49, Joan Moreau wrote:> Hi > > Solr is a standard package in ArchLinux. ("pacman -S solr") . the systemd installation script is included (and it is launching /opt/solr/bin/solr.in.sh) > > Instance : sudo -u solr /opt/solr/bin/solr create -c dovecot -> this creates a separate folder with default solrconfig.xml, schema.xml, etc.. > > I made a symlink of the data folder to a second drive (ext4) much bigger > > On 2018-12-31 14:09, Daniel Miller wrote: > On 12/29/2018 4:49 PM, Joan Moreau wrote: > Also : > > - Java is 10.0.2 > > Same as me. > - If i delete schema.xml but create only managed-schema, the solr refuses to start with a java error "schema.xml missing" > > Ok...so we need to do some more digging. > > How did you install Solr? (I downloaded a "binary" installation and unpacked it) > > How did you create the dovecot instance? (I've provided explicit instructions for how I did it - did you follow those exactly or something different)? > > How are you starting Solr? (I use the provided "solr/bin/solr start" command, wrapped inside a systemd service). > > -- > Daniel-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190102/13ba225d/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: solrconfig.xml Type: text/xml Size: 52356 bytes Desc: not available URL: <https://dovecot.org/pipermail/dovecot/attachments/20190102/13ba225d/attachment-0001.xml>
and the first line of the diff is : < this file, see http://wiki.apache.org/solr/SolrConfigXml. ---> this file, see http://wiki.apache.org/solr/SolrConfigXml.38c38 < <luceneMatchVersion>6.4.1</luceneMatchVersion> ---> <luceneMatchVersion>7.5.0</luceneMatchVersion>So, are you running 6.4.1 or 7.5.0 ???? On 2019-01-02 08:12, Joan Moreau wrote:> The real main differecne seems coming from "diffconfig.xml" > > When I put yours, Solr delete (!) schema.xml and create a "manage-schema" and starts complaining about useless types (tdates, booleans, etc..) that are not needed for Mail fileds > > When I put mine (from standard distribution of Arch), it keeps things as they are (yeah !), does not complains about those useless types and startup properly. > > I attach my diffconfig > > But these are the configurations that one should adjust as per his/her own use. > > The main problem is : After some time of indexing from Dovecot, Dovecot returns errors (invalid SID, etc...) and Solr return "out of range indexes" errors > > On 2019-01-02 07:49, Joan Moreau wrote: > > Hi > > Solr is a standard package in ArchLinux. ("pacman -S solr") . the systemd installation script is included (and it is launching /opt/solr/bin/solr.in.sh) > > Instance : sudo -u solr /opt/solr/bin/solr create -c dovecot -> this creates a separate folder with default solrconfig.xml, schema.xml, etc.. > > I made a symlink of the data folder to a second drive (ext4) much bigger > > On 2018-12-31 14:09, Daniel Miller wrote: > On 12/29/2018 4:49 PM, Joan Moreau wrote: > Also : > > - Java is 10.0.2 > > Same as me. > - If i delete schema.xml but create only managed-schema, the solr refuses to start with a java error "schema.xml missing" > > Ok...so we need to do some more digging. > > How did you install Solr? (I downloaded a "binary" installation and unpacked it) > > How did you create the dovecot instance? (I've provided explicit instructions for how I did it - did you follow those exactly or something different)? > > How are you starting Solr? (I use the provided "solr/bin/solr start" command, wrapped inside a systemd service). > > -- > Daniel-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190102/c6cbc835/attachment-0001.html>
The first result show "no results" in dovecot for any search by header (I typed an email add in RoundCube search box, using Dovecot as back end, using Solr as own backend) So many efforts for crappy results. Can't we really revive Squat ? It is 2 lines of config, and no single problems On January 2, 2019 08:16:33 Joan Moreau via dovecot <dovecot at dovecot.org> wrote:> and the first line of the diff is : > < this file, see http://wiki.apache.org/solr/SolrConfigXml. > --- >> this file, see http://wiki.apache.org/solr/SolrConfigXml. > 38c38 > < <luceneMatchVersion>6.4.1</luceneMatchVersion> > --- >> <luceneMatchVersion>7.5.0</luceneMatchVersion> > > So, are you running 6.4.1 or 7.5.0 ???? > > On 2019-01-02 08:12, Joan Moreau wrote: >> The real main differecne seems coming from "diffconfig.xml" >> >> When I put yours, Solr delete (!) schema.xml and create a "manage-schema" >> and starts complaining about useless types (tdates, booleans, etc..) that >> are not needed for Mail fileds >> >> When I put mine (from standard distribution of Arch), it keeps things as >> they are (yeah !), does not complains about those useless types and startup >> properly. >> >> I attach my diffconfig >> >> >> >> But these are the configurations that one should adjust as per his/her own use. >> >> The main problem is : After some time of indexing from Dovecot, Dovecot >> returns errors (invalid SID, etc...) and Solr return "out of range indexes" >> errors >> >> >> >> >> >> >> On 2019-01-02 07:49, Joan Moreau wrote: >> >> Hi >> >> Solr is a standard package in ArchLinux. ("pacman -S solr") . the systemd >> installation script is included (and it is launching /opt/solr/bin/solr.in.sh) >> >> Instance : sudo -u solr /opt/solr/bin/solr create -c dovecot -> this >> creates a separate folder with default solrconfig.xml, schema.xml, etc.. >> >> I made a symlink of the data folder to a second drive (ext4) much bigger >> >> >> >> >> >> >> >> >> >> >> On 2018-12-31 14:09, Daniel Miller wrote: >> >> On 12/29/2018 4:49 PM, Joan Moreau wrote: >> >> Also : >> >> - Java is 10.0.2 >> >> Same as me. >> >> - If i delete schema.xml but create only managed-schema, the solr refuses >> to start with a java error "schema.xml missing" >> >> Ok...so we need to do some more digging. >> >> How did you install Solr? (I downloaded a "binary" installation and >> unpacked it) >> >> How did you create the dovecot instance? (I've provided explicit >> instructions for how I did it - did you follow those exactly or something >> different)? >> >> How are you starting Solr? (I use the provided "solr/bin/solr start" >> command, wrapped inside a systemd service). >> >> -- >> Daniel-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190102/8e65117a/attachment.html>
> The main problem is : After some time of indexing from Dovecot, Dovecot > returns errors (invalid SID, etc...) and Solr return "out of range > indexes" errorsI've been watching the progress of this thread with no small concern, mainly because I've been tasked with providing a server-side email search facility with a budget and manpower level that comes down to mainly *1*, i.e., me. I was expecting, given the strongly worded language about "just use lucene/SOLR" and "ignore squat", that I should invest time + effort into this JAVA nightmare that is SOLR. I started with squat and another word-indexor system that used out-of-band (not a dovecot plugin) software to provide rapid (sub-second) searches through tens-of-GB-scale mailboxes. Unlike what I was led to believe, the squat indexes worked surprisingly well, once you sorted out the odd resource size (ulimit-related) issues (vsz & friends) limitations. I did notice the "worst-case" search performance have worryingly high O(x) increases in time, but I'd not seen anything that was a dealbreaker. It goes without saying that various substring searches worked as expected, for the most part. My experiences with SOLR were similar to Messr. Moreau's: lots of startup errors with provided schemata files. Lots of JAVA nonsense issues. Lots of sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM, version of SOLR, and dovecot to find the "best" working combination, only to find that the searches didn't work out as expected. I expected to be able to do date-ranging based searches. Didn't work. I expected to search CONTENTS of emails, and despite many days of tweaks, I couldn't get it to index even the basics like filenames/types of attachments, so I could exposed attachment-based searching to my users. So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the following functionality: 1) The ability to search for a string within any of the structured fields (from/subject) that returns correct results? 2) The ability to search for any string within the BODY of emails, including the MIME attachment boundaries? 3) The ability to do "ranging" searches for structures within emails that decompose to "dates" or other simple-numeric data? OPTIONALLY, and this is probably way outside of the scope of the above, despite the fact that it's listed as a "selling point" of SOLR versus other full text search engines: 4) The ability to do searches against any attachments that are able to be post-processed and hyper-indexed by SOLR+Tika? ------------- SOLR seems to have "brand cachet", so presumably it actually works (for somebody). Dovecot has not a little "brand cachet", and for me, I have innate faith and trust in Timo and his software. I am no stranger to the "costs" of "free" software, in that you sacrifice your own blood, sweat, and tears just to get these disparate pieces to work together. I *DO* respect that Timo has to keep the lights (and sauna) on in Finland. Maybe there's a super-secret (no advertised prices, "carrier-only" price list) with _Dovecot, Oy_ wherein the above ARE actually available for something less than 6.022 x 10^23 Euros per centi-second of licencing fees. But please, level with us faithful users. Does this morass of Java B.S. actually work, and if not, please just deprecate and remove this moribund software, and stop trying to bury the only FTS plugin many of us HAVE actually gotten to work. (Pretty please?) I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S. to actually work, as I have. He persevered where I'd given up. He's vocal about it, and now I'm chiming in that this ornate collection of switchblades only cuts those who try to use them. Respectfully, =M=
> On 02 January 2019 at 10:59 "M. Balridge" <dovecot at r.paypc.com> wrote: > > > > > The main problem is : After some time of indexing from Dovecot, Dovecot > > returns errors (invalid SID, etc...) and Solr return "out of range > > indexes" errors > > I've been watching the progress of this thread with no small concern, mainly > because I've been tasked with providing a server-side email search facility > with a budget and manpower level that comes down to mainly *1*, i.e., me. > > I was expecting, given the strongly worded language about "just use > lucene/SOLR" and "ignore squat", that I should invest time + effort into this > JAVA nightmare that is SOLR. > > I started with squat and another word-indexor system that used out-of-band > (not a dovecot plugin) software to provide rapid (sub-second) searches through > tens-of-GB-scale mailboxes. > > Unlike what I was led to believe, the squat indexes worked surprisingly well, > once you sorted out the odd resource size (ulimit-related) issues (vsz & > friends) limitations. I did notice the "worst-case" search performance have > worryingly high O(x) increases in time, but I'd not seen anything that was a > dealbreaker. It goes without saying that various substring searches worked as > expected, for the most part. > > My experiences with SOLR were similar to Messr. Moreau's: lots of startup > errors with provided schemata files. Lots of JAVA nonsense issues. Lots of > sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM, > version of SOLR, and dovecot to find the "best" working combination, only to > find that the searches didn't work out as expected. I expected to be able to > do date-ranging based searches. Didn't work. I expected to search CONTENTS of > emails, and despite many days of tweaks, I couldn't get it to index even the > basics like filenames/types of attachments, so I could exposed > attachment-based searching to my users. > > So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a > Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the > following functionality: > > 1) The ability to search for a string within any of the structured fields > (from/subject) that returns correct results? > > 2) The ability to search for any string within the BODY of emails, including > the MIME attachment boundaries? > > 3) The ability to do "ranging" searches for structures within emails that > decompose to "dates" or other simple-numeric data? > > OPTIONALLY, and this is probably way outside of the scope of the above, > despite the fact that it's listed as a "selling point" of SOLR versus other > full text search engines: > > 4) The ability to do searches against any attachments that are able to be > post-processed and hyper-indexed by SOLR+Tika? > > ------------- > > SOLR seems to have "brand cachet", so presumably it actually works (for somebody). > > Dovecot has not a little "brand cachet", and for me, I have innate faith and > trust in Timo and his software. I am no stranger to the "costs" of "free" > software, in that you sacrifice your own blood, sweat, and tears just to get > these disparate pieces to work together. > > I *DO* respect that Timo has to keep the lights (and sauna) on in Finland. > Maybe there's a super-secret (no advertised prices, "carrier-only" price list) > with _Dovecot, Oy_ wherein the above ARE actually available for something less > than 6.022 x 10^23 Euros per centi-second of licencing fees. > > But please, level with us faithful users. Does this morass of Java B.S. > actually work, and if not, please just deprecate and remove this moribund > software, and stop trying to bury the only FTS plugin many of us HAVE actually > gotten to work. (Pretty please?) > > I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S. > to actually work, as I have. > > He persevered where I'd given up. He's vocal about it, and now I'm chiming in > that this ornate collection of switchblades only cuts those who try to use them. > > Respectfully, > =M>We do intend to polish fts-solr before we drop fts-squat. And even then, anyone is free to pick it up and continue the work, as it works as plugin just fine, so it's not a matter of us just flushing it to oblivion. fts-squat is not really worth pursuing for us since it would eat away effort from our current dovecot fts plugin, which unfortunately is not currently open-sourced. Aki
On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote:> > The main problem is : After some time of indexing from Dovecot, Dovecot > > returns errors (invalid SID, etc...) and Solr return "out of range > > indexes" errors > > I've been watching the progress of this thread with no small concern, mainly > because I've been tasked with providing a server-side email search facility > with a budget and manpower level that comes down to mainly *1*, i.e., me. > > I was expecting, given the strongly worded language about "just use > lucene/SOLR" and "ignore squat", that I should invest time + effort into this > JAVA nightmare that is SOLR. > > I started with squat and another word-indexor system that used out-of-band > (not a dovecot plugin) software to provide rapid (sub-second) searches through > tens-of-GB-scale mailboxes. > > Unlike what I was led to believe, the squat indexes worked surprisingly well, > once you sorted out the odd resource size (ulimit-related) issues (vsz & > friends) limitations. I did notice the "worst-case" search performance have > worryingly high O(x) increases in time, but I'd not seen anything that was a > dealbreaker. It goes without saying that various substring searches worked as > expected, for the most part. > > My experiences with SOLR were similar to Messr. Moreau's: lots of startup > errors with provided schemata files. Lots of JAVA nonsense issues. Lots of > sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM, > version of SOLR, and dovecot to find the "best" working combination, only to > find that the searches didn't work out as expected. I expected to be able to > do date-ranging based searches. Didn't work. I expected to search CONTENTS of > emails, and despite many days of tweaks, I couldn't get it to index even the > basics like filenames/types of attachments, so I could exposed > attachment-based searching to my users. > > So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a > Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the > following functionality: > > 1) The ability to search for a string within any of the structured fields > (from/subject) that returns correct results? > > 2) The ability to search for any string within the BODY of emails, including > the MIME attachment boundaries? > > 3) The ability to do "ranging" searches for structures within emails that > decompose to "dates" or other simple-numeric data? > > OPTIONALLY, and this is probably way outside of the scope of the above, > despite the fact that it's listed as a "selling point" of SOLR versus other > full text search engines: > > 4) The ability to do searches against any attachments that are able to be > post-processed and hyper-indexed by SOLR+Tika? > > ------------- > > SOLR seems to have "brand cachet", so presumably it actually works (for somebody). > > Dovecot has not a little "brand cachet", and for me, I have innate faith and > trust in Timo and his software. I am no stranger to the "costs" of "free" > software, in that you sacrifice your own blood, sweat, and tears just to get > these disparate pieces to work together. > > I *DO* respect that Timo has to keep the lights (and sauna) on in Finland. > Maybe there's a super-secret (no advertised prices, "carrier-only" price list) > with _Dovecot, Oy_ wherein the above ARE actually available for something less > than 6.022 x 10^23 Euros per centi-second of licencing fees. > > But please, level with us faithful users. Does this morass of Java B.S. > actually work, and if not, please just deprecate and remove this moribund > software, and stop trying to bury the only FTS plugin many of us HAVE actually > gotten to work. (Pretty please?) > > I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S. > to actually work, as I have. > > He persevered where I'd given up. He's vocal about it, and now I'm chiming in > that this ornate collection of switchblades only cuts those who try to use them. > > Respectfully, > =M>Fascinating... SOLR says the following are powered by SOLR... https://wiki.apache.org/solr/PublicServers Perhaps if you could find out from that list which of them are using SOLR in conjunction with Dovecot... food for thought...
I'm running 7.5.0.? The solrconfig.xml file is what I've modified over time - I haven't started one from scratch for a while but perhaps I'll try. Have you tried using the complete config that I sent you?? With *all* the files I included - and *none* of yours? -- Daniel On 1/1/2019 4:12 PM, Joan Moreau wrote:> > The real main differecne seems coming from "diffconfig.xml" > > When I put yours, Solr delete (!) schema.xml and create a > "manage-schema" and starts complaining about useless types (tdates, > booleans, etc..) that are not needed for Mail fileds > > When I put mine (from standard distribution of Arch), it keeps things > as they are (yeah !), does not complains about those useless types and > startup properly. > > I attach my diffconfig > > > But these are the configurations that one should adjust as per his/her > own use. > > The main problem is : After some time of indexing from Dovecot, > Dovecot returns errors (invalid SID, etc...) and Solr return "out of > range indexes" errors > > > > On 2019-01-02 07:49, Joan Moreau wrote: > >> Hi >> >> Solr is a standard package in ArchLinux. ("pacman -S solr") . the >> systemd installation script is included (and it is launching >> /opt/solr/bin/solr.in.sh) >> >> Instance : sudo -u solr /opt/solr/bin/solr create -c dovecot -> this >> creates a separate folder with default solrconfig.xml, schema.xml, etc.. >> >> I made a symlink of the data folder to a second drive (ext4) much bigger >> >> >> >> >> >> On 2018-12-31 14:09, Daniel Miller wrote: >> >> On?12/29/2018?4:49?PM,?Joan?Moreau?wrote: >> >> >> Also?: >> >> -?Java?is?10.0.2 >> >> Same?as?me. >> >> >> - If i delete schema.xml but create only managed-schema, the >> solr refuses to start with a java error "schema.xml missing" >> >> Ok...so?we?need?to?do?some?more?digging. >> >> How did you install Solr? (I downloaded a "binary" installation >> and unpacked it) >> >> How did you create the dovecot instance?? (I've provided explicit >> instructions for how I did it - did you follow those exactly or >> something different)? >> >> How are you starting Solr?? (I use the provided "solr/bin/solr >> start" command, wrapped inside a systemd service). >> >> -- >> Daniel >> >>-- -- Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190103/2cf68c5a/attachment.html>
On 1/2/2019 12:59 AM, M. Balridge wrote:> So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a > Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the > following functionality: > > 1) The ability to search for a string within any of the structured fields > (from/subject) that returns correct results?Yes.> > 2) The ability to search for any string within the BODY of emails, including > the MIME attachment boundaries?Yes.> > 3) The ability to do "ranging" searches for structures within emails that > decompose to "dates" or other simple-numeric data?Dunno - I don't think I've needed that and I'm not sure how to do it.? My mail clients are Thunderbird and AquaMail (on Android). If you'll give me either the desired Thunderbird steps or telnet-based IMAP command I'm happy to test.> > OPTIONALLY, and this is probably way outside of the scope of the above, > despite the fact that it's listed as a "selling point" of SOLR versus other > full text search engines: > > 4) The ability to do searches against any attachments that are able to be > post-processed and hyper-indexed by SOLR+Tika?Haven't tried.> SOLR seems to have "brand cachet", so presumably it actually works (for somebody).It works - just sometimes needs more effort to setup than it should.> Dovecot has not a little "brand cachet", and for me, I have innate faith and > trust in Timo and his software.I think we're all in agreement here.> But please, level with us faithful users. Does this morass of Java B.S. > actually work, and if not, please just deprecate and remove this moribund > software, and stop trying to bury the only FTS plugin many of us HAVE actually > gotten to work. (Pretty please?) > > I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S. > to actually work, as I have. > > He persevered where I'd given up. He's vocal about it, and now I'm chiming in > that this ornate collection of switchblades only cuts those who try to use them.Short answer - it actually works.? Longer answer - I've gone through a hate/love/hate/like relationship with Solr myself.? The transition from v3 to v4 was a major headache - and I gave up for a while.? But versions 6 & 7 have been pretty good for me.? I'm neither a Dovecot nor a Solr developer - just enough of a fiddler to get them working to fulfill my own needs. If my unreliable memory serves I believe the Dovecot fts-solr plugin hasn't needed to change much (I recall one significant change required when Solr changed it's protocol - I think an XML/JSON thing).? So having a stable interface let's Timo & Co. forget about on-going FTS development and continue focusing on things not provided by other tools.? Hopefully they'll revisit SIS... I recall reading something about the Lucene library (which Squat & Solr are based on) and again my memory is the C version(s) weren't getting maintained as well as might be desired.? I think having the Solr/Lucene team focusing on Java development was another point of consideration for Dovecot's squat - but I could be totally off here. Based on the errors reported by Joan I believe that system's problems are due to configuration - either Solr, Dovecot, or both.? They don't sound like Java related issues (which are a *major* pain to deal with!).? I've provided a copy of what is a working configuration *for me*.? I'm happy to continue helping as best I can - and if Joan, you, or anyone else would like my aid I'll do my best.? If you're crazy I-mean-trusting enough to have me SSH or remote view to your system I'm willing to take a look.? I've had enough people help me over the years for various packages that I'd like to pay it forward where I can. -- Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190103/84a83fbd/attachment.html>