<!doctype html> <html> <head> <meta charset="UTF-8"> </head> <body> <div> I would recommend making this a standalone plugin for now instead of trying to keep it in core fts. </div> <div> <br> </div> <div> Aki </div> <blockquote type="cite"> <div> On 11 January 2019 at 18:40 Joan Moreau via dovecot < <a href="mailto:dovecot@dovecot.org">dovecot@dovecot.org</a>> wrote: </div> <div> <br> </div> <div> <br> </div> <div> I managed to deal with the namespace issue (updated makefile.am) </div> <div> <br> </div> <div> However, I reach : </div> <div> <br> </div> <div> ../../../src/lib/compat.h:207:19: error: conflicting declaration of </div> <div> 'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage </div> <div> # define pread i_my_pread </div> <div> ^~~~~~~~~~ </div> <div> ../../../src/lib/compat.h:210:9: note: previous declaration with 'C++' </div> <div> linkage </div> <div> ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset); </div> <div> ^~~~~~~~~~ </div> <div> ../../../src/lib/compat.h:208:20: error: conflicting declaration of </div> <div> 'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C' </div> <div> linkage </div> <div> # define pwrite i_my_pwrite </div> <div> <br> </div> <div> Any help welcome </div> <div> <br> </div> <div> Hi, </div> <div> <br> </div> <div> I figured out the "namespace" issue </div> <div> <br> </div> <div> Remaining questions are : </div> <div> <br> </div> <div> 1 - WHat does represent "subargs" in mail_search_args </div> <div> <br> </div> <div> 2 - for rescan : who is responsible for passing again the new email ? Is </div> <div> the Dovecot core sending again all the emails to index ? or the fts </div> <div> shall somehow access the mailbox and read all emails ? Wouldn't just be </div> <div> saying "delete all index and get_last_uid is now 0" the easy way ? or </div> <div> the fts must process all emails (and block the current thread as a </div> <div> mailbx maybe quite large) </div> <div> <br> </div> <div> 3 - for get_last_uid : this uncertainity is very unclear. "If there is a </div> <div> gap, then indexer first indexes all the missing" -> this mean at a </div> <div> certain point, indexer maybe rebuilding a previous email, so *last* uid </div> <div> is something different than max. And how indexer does know whther there </div> <div> is a gap wihtout callong the fts backend (whch it does not as there are </div> <div> no function for that) ? </div> <div> <br> </div> <div> 4 - How to update configure.ac & additional files to add the </div> <div> "--with-xapian" wichi will test for libxapian presence and add it to the </div> <div> build ? </div> <div> <br> </div> <div> Thank you </div> <div> <br> </div> <div> On 2019-01-08 04:24, Timo Sirainen wrote: </div> <div> <br> </div> <div> On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < <a href="mailto:dovecot@dovecot.org">dovecot@dovecot.org</a>> </div> <div> wrote: </div> <div> Hi </div> <div> <br> </div> <div> ANyone to answer specifically ? </div> <div> <br> </div> <div> Q1 : get_last_uid -> Is this the last UID indexed (which may be not the </div> <div> greatest value), or the gratest value (which may not be the latest) (the </div> <div> code of existing plugins is unclear about this, Solr looks for the </div> <div> greatest for insance) </div> <div> All the mails are always supposed to be indexed from the beginning to </div> <div> the last indexed mail. If there's a gap, indexer first indexes all the </div> <div> missing mails. So the latest UID is supposed to be the greatest UID. </div> <div> (Supporting out-of-order indexing would be rather difficult to keep </div> <div> track of.) </div> <div> <br> </div> <div> Q2 : WHen Indexing an email, the data is not passed by "build_key". Why </div> <div> so ? What is the link with "build_more" ? </div> <div> The idea is that it calls something like: </div> <div> <br> </div> <div> - build_key(type=hdr, hdr_name=From) </div> <div> - build_more(" <a href="mailto:tss@iki.fi">tss@iki.fi</a>") </div> <div> - build_key(type=hdr, hdr_name=Subject) </div> <div> - build_more("Re: Solr -> Xapian ?") </div> <div> - build_key(type=body_part) </div> <div> - build_more("message body piece") </div> <div> - build_more("message body piece2") </div> <div> ... </div> <div> <br> </div> <div> Q3 : Searching/Lookup : THe fheader in which to llok for (must be a </div> <div> least among "cc, to, from, subject, body") is not appearing in the </div> <div> 'struct' data. WHere to find it ? </div> <div> lookup() gets struct mail_search_arg *args, which contains the entire </div> <div> IMAP SEARCH query. This could be used for more or less complex query </div> <div> builders. </div> <div> <br> </div> <div> In case of a single header search, you should have </div> <div> args->args->hdr_field_name contain the header name and </div> <div> args->args->value.str contain the content you're searching for. </div> <div> <br> </div> <div> Q4 : Refresh : this is very unclear. How come there would not be the </div> <div> "latest" view on index. What is the real meaning of this function ? </div> <div> In case of Xapian it might not matter if it automatically refreshes its </div> <div> indexes between each query. But with some other indexes this could </div> <div> happen: </div> <div> <br> </div> <div> - IMAP session is opened </div> <div> - IMAP SEARCH is run, which opens and searches the index </div> <div> - a new mail is delivered to the mailbox and indexed </div> <div> - IMAP SEARCH is run. Without refresh() it doesn't see the newly </div> <div> indexed mail and doesn't include it in the search results. </div> <div> <br> </div> <div> Q5 : Rescan : is it just a bout remonving all indexes for a specific </div> <div> mailbox ? </div> <div> It's run when "doveadm fts rescan" is run manually. Usually that's only </div> <div> run manually to fix up some brokenness. So it's intended to verify that </div> <div> the current mailbox contents match the FTS indexes: </div> <div> - If there are any mails in FTS index that no longer exist in the </div> <div> actual mailbox, delete those mails from FTS </div> <div> - If FTS is missing any mails in the middle of the mailbox, make sure </div> <div> that the next mailbox indexing will index those missing mails. I think </div> <div> currently this basically means reindexing all the mails since the first </div> <div> missing mail, even the mails that are already in the index. </div> <div> <br> </div> <div> fts-lucene implements this, but other FTS backends are lazy and simply </div> <div> rebuild all mails. Actually fts-solr is bad because it doesn't even </div> <div> delete the extra mails. </div> <div> <br> </div> <div> Q6 : lokkup_multi : isn't the function the same for all plugnins (see </div> <div> below) ?and finally , for fts_backend_xxxx_lookup_multi, why is that </div> <div> backend dependent ? </div> <div> This function is called only when searching in virtual folders. So for </div> <div> example the virtual "All mails" folder, which would contain all mails in </div> <div> all folders. In that case the boxes[] would contain a list of user's all </div> <div> folders, except Trash and Spam. If lookup_multi() isn't implemented </div> <div> (left to NULL), the search is run separately via lookup() for each </div> <div> folder. With lookup_multi() there can be just one lookup, and the </div> <div> backend can filter only the wanted folders and return them directly. So </div> <div> it's an optimization for FTS indexes that support user-global searches </div> <div> rather than only per-folder searches. </div> <div> <br> </div> <div> static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, </div> <div> struct mailbox *const boxes[], struct mail_search_arg *args, enum </div> <div> fts_lookup_flags flags, struct fts_multi_result *result) </div> <div> { </div> <div> struct xapian_fts_backend_update_context *ctx </div> <div> (struct xapian_fts_backend_update_context *)_ctx; </div> <div> <br> </div> <div> int i=0; </div> <div> <br> </div> <div> while(boxes[i]!=NULL) </div> <div> { </div> <div> if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0) </div> <div> return -1; </div> <div> i++; </div> <div> } </div> <div> return 0; </div> <div> } </div> <div> See fts_backend_lookup_multi() - if you leave lookup_multi=NULL it </div> <div> basically does this. </div> <div> <br> </div> <div> For "rescan " and "optimize", wouldn't it be the dovecot core who </div> <div> indicate which are to be dismissed (expunged), or re-ask for indexing a </div> <div> particular (or all) uid ? WHy would the backend be aware of the </div> <div> transactions on the mailbox ??? </div> <div> rescan() is about fixing up a more or less broken index, or simply to </div> <div> verify that it's all ok. So core doesn't know what messages exist in the </div> <div> FTS index and can't request specific reindexing or expunging. I guess an </div> <div> alternative API could have been to have functions that iterate through </div> <div> all mails in the index, and use that to implement rescan in core. Now </div> <div> thinking about it, that sounds like a simpler and better way. </div> <div> <br> </div> <div> optimize() is currently done only when explicitly running "doveadm fts </div> <div> optimize", which requests running a slower index optimization. Depends </div> <div> on the FTS backend whether this is useful or not. </div> <div> <br> </div> <div> There is alredy "fts_backend_xxx_update_expunge", so I beleive the </div> <div> management of the expunged messages is *NOT* in the backend, right ? </div> <div> Normally when mails are expunged, update_expunge() is called to notify </div> <div> FTS backend that it should delete the mail also from FTS index. </div> <div> <br> </div> <div> .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?* </div> <div> You probably want to use FTS_BACKEND_FLAG_FUZZY_SEARCH only like Solr. </div> <div> See enum fts_backend_flags in fts-api-private.h </div> </blockquote> <div> <br> </div> <div class="io-ox-signature"> --- <br>Aki Tuomi </div> </body> </html>
There is no point into a separate plugin, the purpose is to replace squat as the default fts (solr being a nightmare) On 2019-01-11 18:23, Aki Tuomi wrote:> I would recommend making this a standalone plugin for now instead of trying to keep it in core fts. > > Aki > >> On 11 January 2019 at 18:40 Joan Moreau via dovecot < dovecot at dovecot.org> wrote: >> >> I managed to deal with the namespace issue (updated makefile.am) >> >> However, I reach : >> >> ../../../src/lib/compat.h:207:19: error: conflicting declaration of >> 'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage >> # define pread i_my_pread >> ^~~~~~~~~~ >> ../../../src/lib/compat.h:210:9: note: previous declaration with 'C++' >> linkage >> ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset); >> ^~~~~~~~~~ >> ../../../src/lib/compat.h:208:20: error: conflicting declaration of >> 'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C' >> linkage >> # define pwrite i_my_pwrite >> >> Any help welcome >> >> Hi, >> >> I figured out the "namespace" issue >> >> Remaining questions are : >> >> 1 - WHat does represent "subargs" in mail_search_args >> >> 2 - for rescan : who is responsible for passing again the new email ? Is >> the Dovecot core sending again all the emails to index ? or the fts >> shall somehow access the mailbox and read all emails ? Wouldn't just be >> saying "delete all index and get_last_uid is now 0" the easy way ? or >> the fts must process all emails (and block the current thread as a >> mailbx maybe quite large) >> >> 3 - for get_last_uid : this uncertainity is very unclear. "If there is a >> gap, then indexer first indexes all the missing" -> this mean at a >> certain point, indexer maybe rebuilding a previous email, so *last* uid >> is something different than max. And how indexer does know whther there >> is a gap wihtout callong the fts backend (whch it does not as there are >> no function for that) ? >> >> 4 - How to update configure.ac & additional files to add the >> "--with-xapian" wichi will test for libxapian presence and add it to the >> build ? >> >> Thank you >> >> On 2019-01-08 04:24, Timo Sirainen wrote: >> >> On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < dovecot at dovecot.org> >> wrote: >> Hi >> >> ANyone to answer specifically ? >> >> Q1 : get_last_uid -> Is this the last UID indexed (which may be not the >> greatest value), or the gratest value (which may not be the latest) (the >> code of existing plugins is unclear about this, Solr looks for the >> greatest for insance) >> All the mails are always supposed to be indexed from the beginning to >> the last indexed mail. If there's a gap, indexer first indexes all the >> missing mails. So the latest UID is supposed to be the greatest UID. >> (Supporting out-of-order indexing would be rather difficult to keep >> track of.) >> >> Q2 : WHen Indexing an email, the data is not passed by "build_key". Why >> so ? What is the link with "build_more" ? >> The idea is that it calls something like: >> >> - build_key(type=hdr, hdr_name=From) >> - build_more(" tss at iki.fi") >> - build_key(type=hdr, hdr_name=Subject) >> - build_more("Re: Solr -> Xapian ?") >> - build_key(type=body_part) >> - build_more("message body piece") >> - build_more("message body piece2") >> ... >> >> Q3 : Searching/Lookup : THe fheader in which to llok for (must be a >> least among "cc, to, from, subject, body") is not appearing in the >> 'struct' data. WHere to find it ? >> lookup() gets struct mail_search_arg *args, which contains the entire >> IMAP SEARCH query. This could be used for more or less complex query >> builders. >> >> In case of a single header search, you should have >> args->args->hdr_field_name contain the header name and >> args->args->value.str contain the content you're searching for. >> >> Q4 : Refresh : this is very unclear. How come there would not be the >> "latest" view on index. What is the real meaning of this function ? >> In case of Xapian it might not matter if it automatically refreshes its >> indexes between each query. But with some other indexes this could >> happen: >> >> - IMAP session is opened >> - IMAP SEARCH is run, which opens and searches the index >> - a new mail is delivered to the mailbox and indexed >> - IMAP SEARCH is run. Without refresh() it doesn't see the newly >> indexed mail and doesn't include it in the search results. >> >> Q5 : Rescan : is it just a bout remonving all indexes for a specific >> mailbox ? >> It's run when "doveadm fts rescan" is run manually. Usually that's only >> run manually to fix up some brokenness. So it's intended to verify that >> the current mailbox contents match the FTS indexes: >> - If there are any mails in FTS index that no longer exist in the >> actual mailbox, delete those mails from FTS >> - If FTS is missing any mails in the middle of the mailbox, make sure >> that the next mailbox indexing will index those missing mails. I think >> currently this basically means reindexing all the mails since the first >> missing mail, even the mails that are already in the index. >> >> fts-lucene implements this, but other FTS backends are lazy and simply >> rebuild all mails. Actually fts-solr is bad because it doesn't even >> delete the extra mails. >> >> Q6 : lokkup_multi : isn't the function the same for all plugnins (see >> below) ?and finally , for fts_backend_xxxx_lookup_multi, why is that >> backend dependent ? >> This function is called only when searching in virtual folders. So for >> example the virtual "All mails" folder, which would contain all mails in >> all folders. In that case the boxes[] would contain a list of user's all >> folders, except Trash and Spam. If lookup_multi() isn't implemented >> (left to NULL), the search is run separately via lookup() for each >> folder. With lookup_multi() there can be just one lookup, and the >> backend can filter only the wanted folders and return them directly. So >> it's an optimization for FTS indexes that support user-global searches >> rather than only per-folder searches. >> >> static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, >> struct mailbox *const boxes[], struct mail_search_arg *args, enum >> fts_lookup_flags flags, struct fts_multi_result *result) >> { >> struct xapian_fts_backend_update_context *ctx = >> (struct xapian_fts_backend_update_context *)_ctx; >> >> int i=0; >> >> while(boxes[i]!=NULL) >> { >> if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0) >> return -1; >> i++; >> } >> return 0; >> } >> See fts_backend_lookup_multi() - if you leave lookup_multi=NULL it >> basically does this. >> >> For "rescan " and "optimize", wouldn't it be the dovecot core who >> indicate which are to be dismissed (expunged), or re-ask for indexing a >> particular (or all) uid ? WHy would the backend be aware of the >> transactions on the mailbox ??? >> rescan() is about fixing up a more or less broken index, or simply to >> verify that it's all ok. So core doesn't know what messages exist in the >> FTS index and can't request specific reindexing or expunging. I guess an >> alternative API could have been to have functions that iterate through >> all mails in the index, and use that to implement rescan in core. Now >> thinking about it, that sounds like a simpler and better way. >> >> optimize() is currently done only when explicitly running "doveadm fts >> optimize", which requests running a slower index optimization. Depends >> on the FTS backend whether this is useful or not. >> >> There is alredy "fts_backend_xxx_update_expunge", so I beleive the >> management of the expunged messages is *NOT* in the backend, right ? >> Normally when mails are expunged, update_expunge() is called to notify >> FTS backend that it should delete the mail also from FTS index. >> >> .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?* >> You probably want to use FTS_BACKEND_FLAG_FUZZY_SEARCH only like Solr. >> See enum fts_backend_flags in fts-api-private.h > > --- > Aki Tuomi-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190111/a8383318/attachment-0001.html>
The below patch resolves the compilation error $ DIFF -P COMPAT.H COMPAT.H.JOAN *** compat.h 2019-01-11 20:21:00.726625427 +0100 --- compat.h.joan 2019-01-11 20:14:41.729109919 +0100 *************** struct iovec; *** 202,207 **** --- 202,211 ---- ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len); #endif + #ifdef __cplusplus + extern "C" { + #endif + #if !defined(HAVE_PREAD) || defined(PREAD_WRAPPERS) || defined(PREAD_BROKEN) # ifndef IN_COMPAT_C # define pread i_my_pread *************** ssize_t i_my_pread(int fd, void *buf, si *** 211,216 **** --- 215,225 ---- ssize_t i_my_pwrite(int fd, const void *buf, size_t count, off_t offset); #endif + #ifdef __cplusplus + } + #endif + + #ifndef HAVE_SETEUID # define seteuid i_my_seteuid int i_my_seteuid(uid_t euid); To resolve integration in source tree, the following diff resolve the case: $ DIFF -P CONFIGURE.AC CONFIGURE.AC.JOAN *** configure.ac 2019-01-11 20:19:47.905942264 +0100 --- configure.ac.joan 2019-01-11 17:54:58.433381828 +0100 *************** AS_HELP_STRING([--with-solr], [Build wit *** 172,177 **** --- 172,184 ---- TEST_WITH(solr, $withval), want_solr=no) + AC_ARG_WITH(xapian, + AS_HELP_STRING([--with-xapian], [Build with Xapian full text search support]), + TEST_WITH(xapian, $withval), + want_xapian=auto) + AM_CONDITIONAL(BUILD_XAPIAN, test "$want_xapian" = "yes") + + AC_ARG_WITH(sodium, AS_HELP_STRING([--with-sodium], [Build with libsodium support (enables argon2, default: auto)]), TEST_WITH(sodium, $withval), *************** DOVECOT_WANT_SOLR *** 746,751 **** --- 753,759 ---- DOVECOT_WANT_CLUCENE DOVECOT_WANT_STEMMER DOVECOT_WANT_TEXTCAT + DOVECOT_WANT_XAPIAN DOVECOT_WANT_ICU *************** fi *** 757,762 **** --- 765,774 ---- if test $have_solr = no; then not_fts="$not_fts solr" fi + if test $have_xapian = no; then + not_fts="$not_fts xapian" + fi + dnl ** dnl ** Settings *************** src/plugins/fs-compress/Makefile *** 899,904 **** --- 911,917 ---- src/plugins/fts/Makefile src/plugins/fts-lucene/Makefile src/plugins/fts-solr/Makefile + src/plugins/fts-xapian/Makefile src/plugins/fts-squat/Makefile src/plugins/last-login/Makefile src/plugins/lazy-expunge/Makefile $ DIFF -P MAKEFILE.AM MAKEFILE.AM.JOAN *** Makefile.am 2019-01-11 20:22:23.910740574 +0100 --- Makefile.am.joan 2019-01-11 17:51:19.051153270 +0100 *************** DISTCLEANFILES = \ *** 99,105 **** distcheck-hook: if which scan-build > /dev/null; then \ cd $(distdir)/_build; \ ! scan-build -o scan-reports ../configure --with-ldap=auto --with-pgsql=auto --with-mysql=auto --with-sqlite=auto --with-solr=auto --with-gssapi=auto --with-libwrap=auto; \ rm -rf scan-reports; \ scan-build -o scan-reports make 2>&1 || exit 1; \ if ! rmdir scan-reports 2>/dev/null; then \ --- 99,105 ---- distcheck-hook: if which scan-build > /dev/null; then \ cd $(distdir)/_build; \ ! scan-build -o scan-reports ../configure --with-ldap=auto --with-pgsql=auto --with-mysql=auto --with-sqlite=auto --with-solr=auto --with-xapian=auto --with-gssapi=auto --with-libwrap=auto; \ rm -rf scan-reports; \ scan-build -o scan-reports make 2>&1 || exit 1; \ if ! rmdir scan-reports 2>/dev/null; then \ WHAT ABOUT THE OTHER QUESTIONS ? 1 - WHat does represent "subargs" in mail_search_args 2 - for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) 3 - for get_last_uid : this uncertainity is very unclear. "If there is a gap, then indexer first indexes all the missing" -> this mean at a certain point, indexer maybe rebuilding a previous email, so *last* uid is something different than max. And how indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as there are no function for that) ? Thank you On 2019-01-11 18:27, Joan Moreau wrote:> There is no point into a separate plugin, the purpose is to replace squat as the default fts (solr being a nightmare) > > On 2019-01-11 18:23, Aki Tuomi wrote: > I would recommend making this a standalone plugin for now instead of trying to keep it in core fts. > > Aki > On 11 January 2019 at 18:40 Joan Moreau via dovecot < dovecot at dovecot.org> wrote: > > I managed to deal with the namespace issue (updated makefile.am) > > However, I reach : > > ../../../src/lib/compat.h:207:19: error: conflicting declaration of > 'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage > # define pread i_my_pread > ^~~~~~~~~~ > ../../../src/lib/compat.h:210:9: note: previous declaration with 'C++' > linkage > ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset); > ^~~~~~~~~~ > ../../../src/lib/compat.h:208:20: error: conflicting declaration of > 'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C' > linkage > # define pwrite i_my_pwrite > > Any help welcome > > Hi, > > I figured out the "namespace" issue > > Remaining questions are : > > 1 - WHat does represent "subargs" in mail_search_args > > 2 - for rescan : who is responsible for passing again the new email ? Is > the Dovecot core sending again all the emails to index ? or the fts > shall somehow access the mailbox and read all emails ? Wouldn't just be > saying "delete all index and get_last_uid is now 0" the easy way ? or > the fts must process all emails (and block the current thread as a > mailbx maybe quite large) > > 3 - for get_last_uid : this uncertainity is very unclear. "If there is a > gap, then indexer first indexes all the missing" -> this mean at a > certain point, indexer maybe rebuilding a previous email, so *last* uid > is something different than max. And how indexer does know whther there > is a gap wihtout callong the fts backend (whch it does not as there are > no function for that) ? > > 4 - How to update configure.ac & additional files to add the > "--with-xapian" wichi will test for libxapian presence and add it to the > build ? > > Thank you > > On 2019-01-08 04:24, Timo Sirainen wrote: > > On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < dovecot at dovecot.org> > wrote: > Hi > > ANyone to answer specifically ? > > Q1 : get_last_uid -> Is this the last UID indexed (which may be not the > greatest value), or the gratest value (which may not be the latest) (the > code of existing plugins is unclear about this, Solr looks for the > greatest for insance) > All the mails are always supposed to be indexed from the beginning to > the last indexed mail. If there's a gap, indexer first indexes all the > missing mails. So the latest UID is supposed to be the greatest UID. > (Supporting out-of-order indexing would be rather difficult to keep > track of.) > > Q2 : WHen Indexing an email, the data is not passed by "build_key". Why > so ? What is the link with "build_more" ? > The idea is that it calls something like: > > - build_key(type=hdr, hdr_name=From) > - build_more(" tss at iki.fi") > - build_key(type=hdr, hdr_name=Subject) > - build_more("Re: Solr -> Xapian ?") > - build_key(type=body_part) > - build_more("message body piece") > - build_more("message body piece2") > ... > > Q3 : Searching/Lookup : THe fheader in which to llok for (must be a > least among "cc, to, from, subject, body") is not appearing in the > 'struct' data. WHere to find it ? > lookup() gets struct mail_search_arg *args, which contains the entire > IMAP SEARCH query. This could be used for more or less complex query > builders. > > In case of a single header search, you should have > args->args->hdr_field_name contain the header name and > args->args->value.str contain the content you're searching for. > > Q4 : Refresh : this is very unclear. How come there would not be the > "latest" view on index. What is the real meaning of this function ? > In case of Xapian it might not matter if it automatically refreshes its > indexes between each query. But with some other indexes this could > happen: > > - IMAP session is opened > - IMAP SEARCH is run, which opens and searches the index > - a new mail is delivered to the mailbox and indexed > - IMAP SEARCH is run. Without refresh() it doesn't see the newly > indexed mail and doesn't include it in the search results. > > Q5 : Rescan : is it just a bout remonving all indexes for a specific > mailbox ? > It's run when "doveadm fts rescan" is run manually. Usually that's only > run manually to fix up some brokenness. So it's intended to verify that > the current mailbox contents match the FTS indexes: > - If there are any mails in FTS index that no longer exist in the > actual mailbox, delete those mails from FTS > - If FTS is missing any mails in the middle of the mailbox, make sure > that the next mailbox indexing will index those missing mails. I think > currently this basically means reindexing all the mails since the first > missing mail, even the mails that are already in the index. > > fts-lucene implements this, but other FTS backends are lazy and simply > rebuild all mails. Actually fts-solr is bad because it doesn't even > delete the extra mails. > > Q6 : lokkup_multi : isn't the function the same for all plugnins (see > below) ?and finally , for fts_backend_xxxx_lookup_multi, why is that > backend dependent ? > This function is called only when searching in virtual folders. So for > example the virtual "All mails" folder, which would contain all mails in > all folders. In that case the boxes[] would contain a list of user's all > folders, except Trash and Spam. If lookup_multi() isn't implemented > (left to NULL), the search is run separately via lookup() for each > folder. With lookup_multi() there can be just one lookup, and the > backend can filter only the wanted folders and return them directly. So > it's an optimization for FTS indexes that support user-global searches > rather than only per-folder searches. > > static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, > struct mailbox *const boxes[], struct mail_search_arg *args, enum > fts_lookup_flags flags, struct fts_multi_result *result) > { > struct xapian_fts_backend_update_context *ctx = > (struct xapian_fts_backend_update_context *)_ctx; > > int i=0; > > while(boxes[i]!=NULL) > { > if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0) > return -1; > i++; > } > return 0; > } > See fts_backend_lookup_multi() - if you leave lookup_multi=NULL it > basically does this. > > For "rescan " and "optimize", wouldn't it be the dovecot core who > indicate which are to be dismissed (expunged), or re-ask for indexing a > particular (or all) uid ? WHy would the backend be aware of the > transactions on the mailbox ??? > rescan() is about fixing up a more or less broken index, or simply to > verify that it's all ok. So core doesn't know what messages exist in the > FTS index and can't request specific reindexing or expunging. I guess an > alternative API could have been to have functions that iterate through > all mails in the index, and use that to implement rescan in core. Now > thinking about it, that sounds like a simpler and better way. > > optimize() is currently done only when explicitly running "doveadm fts > optimize", which requests running a slower index optimization. Depends > on the FTS backend whether this is useful or not. > > There is alredy "fts_backend_xxx_update_expunge", so I beleive the > management of the expunged messages is *NOT* in the backend, right ? > Normally when mails are expunged, update_expunge() is called to notify > FTS backend that it should delete the mail also from FTS index. > > .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?* > You probably want to use FTS_BACKEND_FLAG_FUZZY_SEARCH only like Solr. > See enum fts_backend_flags in fts-api-private.h > > --- > Aki Tuomi-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190111/b78f2982/attachment-0001.html>