<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<div>
I would recommend making this a standalone plugin for now instead of trying
to keep it in core fts.
</div>
<div>
<br>
</div>
<div>
Aki
</div>
<blockquote type="cite">
<div>
On 11 January 2019 at 18:40 Joan Moreau via dovecot <
<a
href="mailto:dovecot@dovecot.org">dovecot@dovecot.org</a>>
wrote:
</div>
<div>
<br>
</div>
<div>
<br>
</div>
<div>
I managed to deal with the namespace issue (updated makefile.am)
</div>
<div>
<br>
</div>
<div>
However, I reach :
</div>
<div>
<br>
</div>
<div>
../../../src/lib/compat.h:207:19: error: conflicting declaration of
</div>
<div>
'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C'
linkage
</div>
<div>
# define pread i_my_pread
</div>
<div>
^~~~~~~~~~
</div>
<div>
../../../src/lib/compat.h:210:9: note: previous declaration with
'C++'
</div>
<div>
linkage
</div>
<div>
ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset);
</div>
<div>
^~~~~~~~~~
</div>
<div>
../../../src/lib/compat.h:208:20: error: conflicting declaration of
</div>
<div>
'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with
'C'
</div>
<div>
linkage
</div>
<div>
# define pwrite i_my_pwrite
</div>
<div>
<br>
</div>
<div>
Any help welcome
</div>
<div>
<br>
</div>
<div>
Hi,
</div>
<div>
<br>
</div>
<div>
I figured out the "namespace" issue
</div>
<div>
<br>
</div>
<div>
Remaining questions are :
</div>
<div>
<br>
</div>
<div>
1 - WHat does represent "subargs" in mail_search_args
</div>
<div>
<br>
</div>
<div>
2 - for rescan : who is responsible for passing again the new email ? Is
</div>
<div>
the Dovecot core sending again all the emails to index ? or the fts
</div>
<div>
shall somehow access the mailbox and read all emails ? Wouldn't just be
</div>
<div>
saying "delete all index and get_last_uid is now 0" the easy way ?
or
</div>
<div>
the fts must process all emails (and block the current thread as a
</div>
<div>
mailbx maybe quite large)
</div>
<div>
<br>
</div>
<div>
3 - for get_last_uid : this uncertainity is very unclear. "If there is
a
</div>
<div>
gap, then indexer first indexes all the missing" -> this mean at a
</div>
<div>
certain point, indexer maybe rebuilding a previous email, so *last* uid
</div>
<div>
is something different than max. And how indexer does know whther there
</div>
<div>
is a gap wihtout callong the fts backend (whch it does not as there are
</div>
<div>
no function for that) ?
</div>
<div>
<br>
</div>
<div>
4 - How to update configure.ac & additional files to add the
</div>
<div>
"--with-xapian" wichi will test for libxapian presence and add it
to the
</div>
<div>
build ?
</div>
<div>
<br>
</div>
<div>
Thank you
</div>
<div>
<br>
</div>
<div>
On 2019-01-08 04:24, Timo Sirainen wrote:
</div>
<div>
<br>
</div>
<div>
On 7 Jan 2019, at 16.05, Joan Moreau via dovecot <
<a
href="mailto:dovecot@dovecot.org">dovecot@dovecot.org</a>>
</div>
<div>
wrote:
</div>
<div>
Hi
</div>
<div>
<br>
</div>
<div>
ANyone to answer specifically ?
</div>
<div>
<br>
</div>
<div>
Q1 : get_last_uid -> Is this the last UID indexed (which may be not the
</div>
<div>
greatest value), or the gratest value (which may not be the latest) (the
</div>
<div>
code of existing plugins is unclear about this, Solr looks for the
</div>
<div>
greatest for insance)
</div>
<div>
All the mails are always supposed to be indexed from the beginning to
</div>
<div>
the last indexed mail. If there's a gap, indexer first indexes all the
</div>
<div>
missing mails. So the latest UID is supposed to be the greatest UID.
</div>
<div>
(Supporting out-of-order indexing would be rather difficult to keep
</div>
<div>
track of.)
</div>
<div>
<br>
</div>
<div>
Q2 : WHen Indexing an email, the data is not passed by
"build_key". Why
</div>
<div>
so ? What is the link with "build_more" ?
</div>
<div>
The idea is that it calls something like:
</div>
<div>
<br>
</div>
<div>
- build_key(type=hdr, hdr_name=From)
</div>
<div>
- build_more("
<a href="mailto:tss@iki.fi">tss@iki.fi</a>")
</div>
<div>
- build_key(type=hdr, hdr_name=Subject)
</div>
<div>
- build_more("Re: Solr -> Xapian ?")
</div>
<div>
- build_key(type=body_part)
</div>
<div>
- build_more("message body piece")
</div>
<div>
- build_more("message body piece2")
</div>
<div>
...
</div>
<div>
<br>
</div>
<div>
Q3 : Searching/Lookup : THe fheader in which to llok for (must be a
</div>
<div>
least among "cc, to, from, subject, body") is not appearing in the
</div>
<div>
'struct' data. WHere to find it ?
</div>
<div>
lookup() gets struct mail_search_arg *args, which contains the entire
</div>
<div>
IMAP SEARCH query. This could be used for more or less complex query
</div>
<div>
builders.
</div>
<div>
<br>
</div>
<div>
In case of a single header search, you should have
</div>
<div>
args->args->hdr_field_name contain the header name and
</div>
<div>
args->args->value.str contain the content you're searching for.
</div>
<div>
<br>
</div>
<div>
Q4 : Refresh : this is very unclear. How come there would not be the
</div>
<div>
"latest" view on index. What is the real meaning of this function
?
</div>
<div>
In case of Xapian it might not matter if it automatically refreshes its
</div>
<div>
indexes between each query. But with some other indexes this could
</div>
<div>
happen:
</div>
<div>
<br>
</div>
<div>
- IMAP session is opened
</div>
<div>
- IMAP SEARCH is run, which opens and searches the index
</div>
<div>
- a new mail is delivered to the mailbox and indexed
</div>
<div>
- IMAP SEARCH is run. Without refresh() it doesn't see the newly
</div>
<div>
indexed mail and doesn't include it in the search results.
</div>
<div>
<br>
</div>
<div>
Q5 : Rescan : is it just a bout remonving all indexes for a specific
</div>
<div>
mailbox ?
</div>
<div>
It's run when "doveadm fts rescan" is run manually. Usually
that's only
</div>
<div>
run manually to fix up some brokenness. So it's intended to verify that
</div>
<div>
the current mailbox contents match the FTS indexes:
</div>
<div>
- If there are any mails in FTS index that no longer exist in the
</div>
<div>
actual mailbox, delete those mails from FTS
</div>
<div>
- If FTS is missing any mails in the middle of the mailbox, make sure
</div>
<div>
that the next mailbox indexing will index those missing mails. I think
</div>
<div>
currently this basically means reindexing all the mails since the first
</div>
<div>
missing mail, even the mails that are already in the index.
</div>
<div>
<br>
</div>
<div>
fts-lucene implements this, but other FTS backends are lazy and simply
</div>
<div>
rebuild all mails. Actually fts-solr is bad because it doesn't even
</div>
<div>
delete the extra mails.
</div>
<div>
<br>
</div>
<div>
Q6 : lokkup_multi : isn't the function the same for all plugnins (see
</div>
<div>
below) ?and finally , for fts_backend_xxxx_lookup_multi, why is that
</div>
<div>
backend dependent ?
</div>
<div>
This function is called only when searching in virtual folders. So for
</div>
<div>
example the virtual "All mails" folder, which would contain all
mails in
</div>
<div>
all folders. In that case the boxes[] would contain a list of user's all
</div>
<div>
folders, except Trash and Spam. If lookup_multi() isn't implemented
</div>
<div>
(left to NULL), the search is run separately via lookup() for each
</div>
<div>
folder. With lookup_multi() there can be just one lookup, and the
</div>
<div>
backend can filter only the wanted folders and return them directly. So
</div>
<div>
it's an optimization for FTS indexes that support user-global searches
</div>
<div>
rather than only per-folder searches.
</div>
<div>
<br>
</div>
<div>
static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend,
</div>
<div>
struct mailbox *const boxes[], struct mail_search_arg *args, enum
</div>
<div>
fts_lookup_flags flags, struct fts_multi_result *result)
</div>
<div>
{
</div>
<div>
struct xapian_fts_backend_update_context *ctx </div>
<div>
(struct xapian_fts_backend_update_context *)_ctx;
</div>
<div>
<br>
</div>
<div>
int i=0;
</div>
<div>
<br>
</div>
<div>
while(boxes[i]!=NULL)
</div>
<div>
{
</div>
<div>
if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0)
</div>
<div>
return -1;
</div>
<div>
i++;
</div>
<div>
}
</div>
<div>
return 0;
</div>
<div>
}
</div>
<div>
See fts_backend_lookup_multi() - if you leave lookup_multi=NULL it
</div>
<div>
basically does this.
</div>
<div>
<br>
</div>
<div>
For "rescan " and "optimize", wouldn't it be the
dovecot core who
</div>
<div>
indicate which are to be dismissed (expunged), or re-ask for indexing a
</div>
<div>
particular (or all) uid ? WHy would the backend be aware of the
</div>
<div>
transactions on the mailbox ???
</div>
<div>
rescan() is about fixing up a more or less broken index, or simply to
</div>
<div>
verify that it's all ok. So core doesn't know what messages exist in
the
</div>
<div>
FTS index and can't request specific reindexing or expunging. I guess an
</div>
<div>
alternative API could have been to have functions that iterate through
</div>
<div>
all mails in the index, and use that to implement rescan in core. Now
</div>
<div>
thinking about it, that sounds like a simpler and better way.
</div>
<div>
<br>
</div>
<div>
optimize() is currently done only when explicitly running "doveadm fts
</div>
<div>
optimize", which requests running a slower index optimization. Depends
</div>
<div>
on the FTS backend whether this is useful or not.
</div>
<div>
<br>
</div>
<div>
There is alredy "fts_backend_xxx_update_expunge", so I beleive the
</div>
<div>
management of the expunged messages is *NOT* in the backend, right ?
</div>
<div>
Normally when mails are expunged, update_expunge() is called to notify
</div>
<div>
FTS backend that it should delete the mail also from FTS index.
</div>
<div>
<br>
</div>
<div>
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*
</div>
<div>
You probably want to use FTS_BACKEND_FLAG_FUZZY_SEARCH only like Solr.
</div>
<div>
See enum fts_backend_flags in fts-api-private.h
</div>
</blockquote>
<div>
<br>
</div>
<div class="io-ox-signature">
---
<br>Aki Tuomi
</div>
</body>
</html>
There is no point into a separate plugin, the purpose is to replace squat as the default fts (solr being a nightmare) On 2019-01-11 18:23, Aki Tuomi wrote:> I would recommend making this a standalone plugin for now instead of trying to keep it in core fts. > > Aki > >> On 11 January 2019 at 18:40 Joan Moreau via dovecot < dovecot at dovecot.org> wrote: >> >> I managed to deal with the namespace issue (updated makefile.am) >> >> However, I reach : >> >> ../../../src/lib/compat.h:207:19: error: conflicting declaration of >> 'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage >> # define pread i_my_pread >> ^~~~~~~~~~ >> ../../../src/lib/compat.h:210:9: note: previous declaration with 'C++' >> linkage >> ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset); >> ^~~~~~~~~~ >> ../../../src/lib/compat.h:208:20: error: conflicting declaration of >> 'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C' >> linkage >> # define pwrite i_my_pwrite >> >> Any help welcome >> >> Hi, >> >> I figured out the "namespace" issue >> >> Remaining questions are : >> >> 1 - WHat does represent "subargs" in mail_search_args >> >> 2 - for rescan : who is responsible for passing again the new email ? Is >> the Dovecot core sending again all the emails to index ? or the fts >> shall somehow access the mailbox and read all emails ? Wouldn't just be >> saying "delete all index and get_last_uid is now 0" the easy way ? or >> the fts must process all emails (and block the current thread as a >> mailbx maybe quite large) >> >> 3 - for get_last_uid : this uncertainity is very unclear. "If there is a >> gap, then indexer first indexes all the missing" -> this mean at a >> certain point, indexer maybe rebuilding a previous email, so *last* uid >> is something different than max. And how indexer does know whther there >> is a gap wihtout callong the fts backend (whch it does not as there are >> no function for that) ? >> >> 4 - How to update configure.ac & additional files to add the >> "--with-xapian" wichi will test for libxapian presence and add it to the >> build ? >> >> Thank you >> >> On 2019-01-08 04:24, Timo Sirainen wrote: >> >> On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < dovecot at dovecot.org> >> wrote: >> Hi >> >> ANyone to answer specifically ? >> >> Q1 : get_last_uid -> Is this the last UID indexed (which may be not the >> greatest value), or the gratest value (which may not be the latest) (the >> code of existing plugins is unclear about this, Solr looks for the >> greatest for insance) >> All the mails are always supposed to be indexed from the beginning to >> the last indexed mail. If there's a gap, indexer first indexes all the >> missing mails. So the latest UID is supposed to be the greatest UID. >> (Supporting out-of-order indexing would be rather difficult to keep >> track of.) >> >> Q2 : WHen Indexing an email, the data is not passed by "build_key". Why >> so ? What is the link with "build_more" ? >> The idea is that it calls something like: >> >> - build_key(type=hdr, hdr_name=From) >> - build_more(" tss at iki.fi") >> - build_key(type=hdr, hdr_name=Subject) >> - build_more("Re: Solr -> Xapian ?") >> - build_key(type=body_part) >> - build_more("message body piece") >> - build_more("message body piece2") >> ... >> >> Q3 : Searching/Lookup : THe fheader in which to llok for (must be a >> least among "cc, to, from, subject, body") is not appearing in the >> 'struct' data. WHere to find it ? >> lookup() gets struct mail_search_arg *args, which contains the entire >> IMAP SEARCH query. This could be used for more or less complex query >> builders. >> >> In case of a single header search, you should have >> args->args->hdr_field_name contain the header name and >> args->args->value.str contain the content you're searching for. >> >> Q4 : Refresh : this is very unclear. How come there would not be the >> "latest" view on index. What is the real meaning of this function ? >> In case of Xapian it might not matter if it automatically refreshes its >> indexes between each query. But with some other indexes this could >> happen: >> >> - IMAP session is opened >> - IMAP SEARCH is run, which opens and searches the index >> - a new mail is delivered to the mailbox and indexed >> - IMAP SEARCH is run. Without refresh() it doesn't see the newly >> indexed mail and doesn't include it in the search results. >> >> Q5 : Rescan : is it just a bout remonving all indexes for a specific >> mailbox ? >> It's run when "doveadm fts rescan" is run manually. Usually that's only >> run manually to fix up some brokenness. So it's intended to verify that >> the current mailbox contents match the FTS indexes: >> - If there are any mails in FTS index that no longer exist in the >> actual mailbox, delete those mails from FTS >> - If FTS is missing any mails in the middle of the mailbox, make sure >> that the next mailbox indexing will index those missing mails. I think >> currently this basically means reindexing all the mails since the first >> missing mail, even the mails that are already in the index. >> >> fts-lucene implements this, but other FTS backends are lazy and simply >> rebuild all mails. Actually fts-solr is bad because it doesn't even >> delete the extra mails. >> >> Q6 : lokkup_multi : isn't the function the same for all plugnins (see >> below) ?and finally , for fts_backend_xxxx_lookup_multi, why is that >> backend dependent ? >> This function is called only when searching in virtual folders. So for >> example the virtual "All mails" folder, which would contain all mails in >> all folders. In that case the boxes[] would contain a list of user's all >> folders, except Trash and Spam. If lookup_multi() isn't implemented >> (left to NULL), the search is run separately via lookup() for each >> folder. With lookup_multi() there can be just one lookup, and the >> backend can filter only the wanted folders and return them directly. So >> it's an optimization for FTS indexes that support user-global searches >> rather than only per-folder searches. >> >> static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, >> struct mailbox *const boxes[], struct mail_search_arg *args, enum >> fts_lookup_flags flags, struct fts_multi_result *result) >> { >> struct xapian_fts_backend_update_context *ctx = >> (struct xapian_fts_backend_update_context *)_ctx; >> >> int i=0; >> >> while(boxes[i]!=NULL) >> { >> if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0) >> return -1; >> i++; >> } >> return 0; >> } >> See fts_backend_lookup_multi() - if you leave lookup_multi=NULL it >> basically does this. >> >> For "rescan " and "optimize", wouldn't it be the dovecot core who >> indicate which are to be dismissed (expunged), or re-ask for indexing a >> particular (or all) uid ? WHy would the backend be aware of the >> transactions on the mailbox ??? >> rescan() is about fixing up a more or less broken index, or simply to >> verify that it's all ok. So core doesn't know what messages exist in the >> FTS index and can't request specific reindexing or expunging. I guess an >> alternative API could have been to have functions that iterate through >> all mails in the index, and use that to implement rescan in core. Now >> thinking about it, that sounds like a simpler and better way. >> >> optimize() is currently done only when explicitly running "doveadm fts >> optimize", which requests running a slower index optimization. Depends >> on the FTS backend whether this is useful or not. >> >> There is alredy "fts_backend_xxx_update_expunge", so I beleive the >> management of the expunged messages is *NOT* in the backend, right ? >> Normally when mails are expunged, update_expunge() is called to notify >> FTS backend that it should delete the mail also from FTS index. >> >> .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?* >> You probably want to use FTS_BACKEND_FLAG_FUZZY_SEARCH only like Solr. >> See enum fts_backend_flags in fts-api-private.h > > --- > Aki Tuomi-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20190111/a8383318/attachment-0001.html>
The below patch resolves the compilation error
$ DIFF -P COMPAT.H COMPAT.H.JOAN
*** compat.h 2019-01-11 20:21:00.726625427 +0100
--- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
*************** struct iovec;
*** 202,207 ****
--- 202,211 ----
ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
#endif
+ #ifdef __cplusplus
+ extern "C" {
+ #endif
+
#if !defined(HAVE_PREAD) || defined(PREAD_WRAPPERS) ||
defined(PREAD_BROKEN)
# ifndef IN_COMPAT_C
# define pread i_my_pread
*************** ssize_t i_my_pread(int fd, void *buf, si
*** 211,216 ****
--- 215,225 ----
ssize_t i_my_pwrite(int fd, const void *buf, size_t count, off_t
offset);
#endif
+ #ifdef __cplusplus
+ }
+ #endif
+
+
#ifndef HAVE_SETEUID
# define seteuid i_my_seteuid
int i_my_seteuid(uid_t euid);
To resolve integration in source tree, the following diff resolve the
case:
$ DIFF -P CONFIGURE.AC CONFIGURE.AC.JOAN
*** configure.ac 2019-01-11 20:19:47.905942264 +0100
--- configure.ac.joan 2019-01-11 17:54:58.433381828 +0100
*************** AS_HELP_STRING([--with-solr], [Build wit
*** 172,177 ****
--- 172,184 ----
TEST_WITH(solr, $withval),
want_solr=no)
+ AC_ARG_WITH(xapian,
+ AS_HELP_STRING([--with-xapian], [Build with Xapian full text search
support]),
+ TEST_WITH(xapian, $withval),
+ want_xapian=auto)
+ AM_CONDITIONAL(BUILD_XAPIAN, test "$want_xapian" = "yes")
+
+
AC_ARG_WITH(sodium,
AS_HELP_STRING([--with-sodium], [Build with libsodium support (enables
argon2, default: auto)]),
TEST_WITH(sodium, $withval),
*************** DOVECOT_WANT_SOLR
*** 746,751 ****
--- 753,759 ----
DOVECOT_WANT_CLUCENE
DOVECOT_WANT_STEMMER
DOVECOT_WANT_TEXTCAT
+ DOVECOT_WANT_XAPIAN
DOVECOT_WANT_ICU
*************** fi
*** 757,762 ****
--- 765,774 ----
if test $have_solr = no; then
not_fts="$not_fts solr"
fi
+ if test $have_xapian = no; then
+ not_fts="$not_fts xapian"
+ fi
+
dnl **
dnl ** Settings
*************** src/plugins/fs-compress/Makefile
*** 899,904 ****
--- 911,917 ----
src/plugins/fts/Makefile
src/plugins/fts-lucene/Makefile
src/plugins/fts-solr/Makefile
+ src/plugins/fts-xapian/Makefile
src/plugins/fts-squat/Makefile
src/plugins/last-login/Makefile
src/plugins/lazy-expunge/Makefile
$ DIFF -P MAKEFILE.AM MAKEFILE.AM.JOAN
*** Makefile.am 2019-01-11 20:22:23.910740574 +0100
--- Makefile.am.joan 2019-01-11 17:51:19.051153270 +0100
*************** DISTCLEANFILES = \
*** 99,105 ****
distcheck-hook:
if which scan-build > /dev/null; then \
cd $(distdir)/_build; \
! scan-build -o scan-reports ../configure --with-ldap=auto
--with-pgsql=auto --with-mysql=auto --with-sqlite=auto --with-solr=auto
--with-gssapi=auto --with-libwrap=auto; \
rm -rf scan-reports; \
scan-build -o scan-reports make 2>&1 || exit 1; \
if ! rmdir scan-reports 2>/dev/null; then \
--- 99,105 ----
distcheck-hook:
if which scan-build > /dev/null; then \
cd $(distdir)/_build; \
! scan-build -o scan-reports ../configure --with-ldap=auto
--with-pgsql=auto --with-mysql=auto --with-sqlite=auto --with-solr=auto
--with-xapian=auto --with-gssapi=auto --with-libwrap=auto; \
rm -rf scan-reports; \
scan-build -o scan-reports make 2>&1 || exit 1; \
if ! rmdir scan-reports 2>/dev/null; then \
WHAT ABOUT THE OTHER QUESTIONS ?
1 - WHat does represent "subargs" in mail_search_args
2 - for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large)
3 - for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ?
Thank you
On 2019-01-11 18:27, Joan Moreau wrote:
> There is no point into a separate plugin, the purpose is to replace squat
as the default fts (solr being a nightmare)
>
> On 2019-01-11 18:23, Aki Tuomi wrote:
> I would recommend making this a standalone plugin for now instead of trying
to keep it in core fts.
>
> Aki
> On 11 January 2019 at 18:40 Joan Moreau via dovecot < dovecot at
dovecot.org> wrote:
>
> I managed to deal with the namespace issue (updated makefile.am)
>
> However, I reach :
>
> ../../../src/lib/compat.h:207:19: error: conflicting declaration of
> 'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C'
linkage
> # define pread i_my_pread
> ^~~~~~~~~~
> ../../../src/lib/compat.h:210:9: note: previous declaration with
'C++'
> linkage
> ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset);
> ^~~~~~~~~~
> ../../../src/lib/compat.h:208:20: error: conflicting declaration of
> 'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with
'C'
> linkage
> # define pwrite i_my_pwrite
>
> Any help welcome
>
> Hi,
>
> I figured out the "namespace" issue
>
> Remaining questions are :
>
> 1 - WHat does represent "subargs" in mail_search_args
>
> 2 - for rescan : who is responsible for passing again the new email ? Is
> the Dovecot core sending again all the emails to index ? or the fts
> shall somehow access the mailbox and read all emails ? Wouldn't just be
> saying "delete all index and get_last_uid is now 0" the easy way
? or
> the fts must process all emails (and block the current thread as a
> mailbx maybe quite large)
>
> 3 - for get_last_uid : this uncertainity is very unclear. "If there is
a
> gap, then indexer first indexes all the missing" -> this mean at a
> certain point, indexer maybe rebuilding a previous email, so *last* uid
> is something different than max. And how indexer does know whther there
> is a gap wihtout callong the fts backend (whch it does not as there are
> no function for that) ?
>
> 4 - How to update configure.ac & additional files to add the
> "--with-xapian" wichi will test for libxapian presence and add it
to the
> build ?
>
> Thank you
>
> On 2019-01-08 04:24, Timo Sirainen wrote:
>
> On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < dovecot at
dovecot.org>
> wrote:
> Hi
>
> ANyone to answer specifically ?
>
> Q1 : get_last_uid -> Is this the last UID indexed (which may be not the
> greatest value), or the gratest value (which may not be the latest) (the
> code of existing plugins is unclear about this, Solr looks for the
> greatest for insance)
> All the mails are always supposed to be indexed from the beginning to
> the last indexed mail. If there's a gap, indexer first indexes all the
> missing mails. So the latest UID is supposed to be the greatest UID.
> (Supporting out-of-order indexing would be rather difficult to keep
> track of.)
>
> Q2 : WHen Indexing an email, the data is not passed by
"build_key". Why
> so ? What is the link with "build_more" ?
> The idea is that it calls something like:
>
> - build_key(type=hdr, hdr_name=From)
> - build_more(" tss at iki.fi")
> - build_key(type=hdr, hdr_name=Subject)
> - build_more("Re: Solr -> Xapian ?")
> - build_key(type=body_part)
> - build_more("message body piece")
> - build_more("message body piece2")
> ...
>
> Q3 : Searching/Lookup : THe fheader in which to llok for (must be a
> least among "cc, to, from, subject, body") is not appearing in
the
> 'struct' data. WHere to find it ?
> lookup() gets struct mail_search_arg *args, which contains the entire
> IMAP SEARCH query. This could be used for more or less complex query
> builders.
>
> In case of a single header search, you should have
> args->args->hdr_field_name contain the header name and
> args->args->value.str contain the content you're searching for.
>
> Q4 : Refresh : this is very unclear. How come there would not be the
> "latest" view on index. What is the real meaning of this function
?
> In case of Xapian it might not matter if it automatically refreshes its
> indexes between each query. But with some other indexes this could
> happen:
>
> - IMAP session is opened
> - IMAP SEARCH is run, which opens and searches the index
> - a new mail is delivered to the mailbox and indexed
> - IMAP SEARCH is run. Without refresh() it doesn't see the newly
> indexed mail and doesn't include it in the search results.
>
> Q5 : Rescan : is it just a bout remonving all indexes for a specific
> mailbox ?
> It's run when "doveadm fts rescan" is run manually. Usually
that's only
> run manually to fix up some brokenness. So it's intended to verify that
> the current mailbox contents match the FTS indexes:
> - If there are any mails in FTS index that no longer exist in the
> actual mailbox, delete those mails from FTS
> - If FTS is missing any mails in the middle of the mailbox, make sure
> that the next mailbox indexing will index those missing mails. I think
> currently this basically means reindexing all the mails since the first
> missing mail, even the mails that are already in the index.
>
> fts-lucene implements this, but other FTS backends are lazy and simply
> rebuild all mails. Actually fts-solr is bad because it doesn't even
> delete the extra mails.
>
> Q6 : lokkup_multi : isn't the function the same for all plugnins (see
> below) ?and finally , for fts_backend_xxxx_lookup_multi, why is that
> backend dependent ?
> This function is called only when searching in virtual folders. So for
> example the virtual "All mails" folder, which would contain all
mails in
> all folders. In that case the boxes[] would contain a list of user's
all
> folders, except Trash and Spam. If lookup_multi() isn't implemented
> (left to NULL), the search is run separately via lookup() for each
> folder. With lookup_multi() there can be just one lookup, and the
> backend can filter only the wanted folders and return them directly. So
> it's an optimization for FTS indexes that support user-global searches
> rather than only per-folder searches.
>
> static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend,
> struct mailbox *const boxes[], struct mail_search_arg *args, enum
> fts_lookup_flags flags, struct fts_multi_result *result)
> {
> struct xapian_fts_backend_update_context *ctx =
> (struct xapian_fts_backend_update_context *)_ctx;
>
> int i=0;
>
> while(boxes[i]!=NULL)
> {
>
if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0)
> return -1;
> i++;
> }
> return 0;
> }
> See fts_backend_lookup_multi() - if you leave lookup_multi=NULL it
> basically does this.
>
> For "rescan " and "optimize", wouldn't it be the
dovecot core who
> indicate which are to be dismissed (expunged), or re-ask for indexing a
> particular (or all) uid ? WHy would the backend be aware of the
> transactions on the mailbox ???
> rescan() is about fixing up a more or less broken index, or simply to
> verify that it's all ok. So core doesn't know what messages exist
in the
> FTS index and can't request specific reindexing or expunging. I guess
an
> alternative API could have been to have functions that iterate through
> all mails in the index, and use that to implement rescan in core. Now
> thinking about it, that sounds like a simpler and better way.
>
> optimize() is currently done only when explicitly running "doveadm fts
> optimize", which requests running a slower index optimization. Depends
> on the FTS backend whether this is useful or not.
>
> There is alredy "fts_backend_xxx_update_expunge", so I beleive
the
> management of the expunged messages is *NOT* in the backend, right ?
> Normally when mails are expunged, update_expunge() is called to notify
> FTS backend that it should delete the mail also from FTS index.
>
> .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*
> You probably want to use FTS_BACKEND_FLAG_FUZZY_SEARCH only like Solr.
> See enum fts_backend_flags in fts-api-private.h
>
> ---
> Aki Tuomi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://dovecot.org/pipermail/dovecot/attachments/20190111/b78f2982/attachment-0001.html>