Patrik Peng
2022-Feb-09 11:21 UTC
Different handling of upper and lower case while indexing/searching with Solr
Hello there We stumbled upon an user account with Solr FTS, which returned no search results for any given search query. Further investigation revealed an issue between indexing mails and querying the index. The user name contains upper and lower case characters (eg. Some.User at domain.net). When new mail is indexed for this user, the user name used for Solr's `user` and `id` fields are transformed into lowercase, as shown in the Solr log: webapp=/solr path=/update params={...}{add=[8543/426f3b0348d03451a3fb00008ba2b673/some.user at domain.net (1724281617442144256), ... (162 adds)]} 0 44298 And can be confirmed by manually querying Solr. The Solr schema in use performs no transformation for the affected fields. When a search request is performed via IMAP, Dovecot queries Solr with the original user name: GET /solr/dovecot_fts_popimap/select?wt=json&f...&fq=%2Bbox:1a30ec359dce3451b8e600008ba2b673+%2Buser:Some.User at domain.net HTTP/1.1" Which (correctly) returns zero results. To summarize, I suspect dovecot transforms any user name to lower case while indexing mails, but not when querying for results. Is this a bug, or caused by misconfiguration? Regards Patrik -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20220209/861816b1/attachment.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 840 bytes Desc: OpenPGP digital signature URL: <https://dovecot.org/pipermail/dovecot/attachments/20220209/861816b1/attachment.sig>
Patrik Peng
2022-Feb-09 11:31 UTC
Different handling of upper and lower case while indexing/searching with Solr
Woops, this time with better formatting. On 09.02.22 12:21, Patrik Peng wrote:> > Hello there > > We stumbled upon an user account with Solr FTS, which returned no > search results for any given search query. > Further investigation revealed an issue between indexing mails and > querying the index. > The user name contains upper and lower case characters (eg. > Some.User at domain.net). > > When new mail is indexed for this user, the user name used for Solr's > `user` and `id` fields are transformed into lowercase, as shown in the > Solr log: > > webapp=/solr path=/update > params={...}{add=[8543/426f3b0348d03451a3fb00008ba2b673/some.user at domain.net > (1724281617442144256), ... (162 adds)]} 0 44298 > > And can be confirmed by manually querying Solr. The Solr schema in use > performs no transformation for the affected fields. > When a search request is performed via IMAP, Dovecot queries Solr with > the original user name: > > GET > /solr/dovecot_fts_popimap/select?wt=json&f...&fq=%2Bbox:1a30ec359dce3451b8e600008ba2b673+%2Buser:Some.User at domain.net > HTTP/1.1" > > Which (correctly) returns zero results. > > To summarize, I suspect dovecot transforms any user name to lower case > while indexing mails, but not when querying for results. > > Is this a bug, or caused by misconfiguration? > > Regards > Patrik-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20220209/824475fd/attachment-0001.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 840 bytes Desc: OpenPGP digital signature URL: <https://dovecot.org/pipermail/dovecot/attachments/20220209/824475fd/attachment-0001.sig>