Timo Sirainen
2018-May-24 13:33 UTC
Bug: Dovecot index loosing sync with FTS despite "fts_autoindex = yes"
On 21 May 2018, at 14.11, kadafax at gmail.com wrote:> > Le 21/05/2018 ? 12:38, Aki Tuomi a ?crit : >> can you try turning on pluign { fts_enforced = yes } and repeat your test? > > Same (wrong) result: > 1. Send an email with "too6Ouka" in the body > > 2. Search against "too6Ouka": > # doveadm search -u username mailbox INBOX body too6Ouka > --> No result > > 3. Force re-index: > # doveadm fts rescan -u username > > 4. Search again against "too6Ouka": > # doveadm search -u username mailbox INBOX body too6Ouka > --> e09cce0283e8695ab760000002deed92 29055 > > Don't know if relevant, but on a side note, if I send a second message > with "too6Ouka" in the body, followed by: > # doveadm -v index -u username Inbox > --> doveadm(username): Info: INBOX: Cache is already up to date > And a search against the pattern immediately return only one result > instead of two.So it looks like after "doveadm fts rescan" you can do initial indexing successfully. Afterwards the index isn't updated at all, unless you rebuild it entirely. fts_autoindex=yes appears to work as expected, because after fts rescan you did a search which automatically updated the index. So this doesn't have anything to do with fts_autoindex, just that your index updates don't seem to work at all. What does it show in the header if you do: doveadm dump /var/vmail/username/dovecot.index Especially what is in the "fts" header vs. "next uid" header? Does the UID in "fts" header keep changing every time you save a new mail? I suppose it will. You could also monitor (e.g. tcpdump/wireshark) the network traffic between Dovecot <-> Solr what happens when a new mail arrives. I suspect Dovecot sends it to Solr, which for whatever reason just ignores the update.
kfx
2018-May-28 10:28 UTC
Bug: Dovecot index loosing sync with FTS despite "fts_autoindex = yes"
On 24/05/2018 15:33, Timo Sirainen wrote:> On 21 May 2018, at 14.11, kadafax at gmail.com wrote: >> >> Le 21/05/2018 ? 12:38, Aki Tuomi a ?crit : >>> can you try turning on pluign { fts_enforced = yes } and repeat your test? >> >> Same (wrong) result: >> 1. Send an email with "too6Ouka" in the body >> >> 2. Search against "too6Ouka": >> # doveadm search -u username mailbox INBOX body too6Ouka >> --> No result >> >> 3. Force re-index: >> # doveadm fts rescan -u username >> >> 4. Search again against "too6Ouka": >> # doveadm search -u username mailbox INBOX body too6Ouka >> --> e09cce0283e8695ab760000002deed92 29055 >> >> Don't know if relevant, but on a side note, if I send a second message >> with "too6Ouka" in the body, followed by: >> # doveadm -v index -u username Inbox >> --> doveadm(username): Info: INBOX: Cache is already up to date >> And a search against the pattern immediately return only one result >> instead of two. > > So it looks like after "doveadm fts rescan" you can do initial indexing successfully. Afterwards the index isn't updated at all, unless you rebuild it entirely. fts_autoindex=yes appears to work as expected, because after fts rescan you did a search which automatically updated the index. So this doesn't have anything to do with fts_autoindex, just that your index updates don't seem to work at all. > > What does it show in the header if you do: > doveadm dump /var/vmail/username/dovecot.indexDetected file type: index -- INDEX: /var/vmail/username/dovecot.index version .................. = 7.3 base header size ......... = 120 header size .............. = 640 record size .............. = 16 compat flags ............. = 1 index id ................. = 1527164208 (2018-05-24 14:16:48) flags .................... = 0 uid validity ............. = 1516890243 (2018-01-25 15:24:03) next uid ................. = 30104 messages count ........... = 17068 seen messages count ...... = 17011 deleted messages count ... = 0 first recent uid ......... = 30104 first unseen uid lowwater = 393 first deleted uid lowwater = 30087 log file seq ............. = 14 log file tail offset ..... = 28144 log file head offset ..... = 28144 log2 rotate time ......... = 1527487933 (2018-05-28 08:12:13) last temp file scan ...... = 0 (1970-01-01 01:00:00) day stamp ................ = 1527458400 (2018-05-28 00:00:00) day first uid[0] ......... = 30037 day first uid[1] ......... = 29921 day first uid[2] ......... = 29814 day first uid[3] ......... = 29669 day first uid[4] ......... = 2 day first uid[5] ......... = 0 day first uid[6] ......... = 0 day first uid[7] ......... = 0 -- Extension 0 -- name ........ = maildir hdr_size .... = 36 reset_id .... = 0 record_offset = 0 record_size . = 0 record_align = 0 header - new_check_time .... = 2018-05-28 11:48:14 - new_mtime ......... = 2018-05-28 11:47:59 - new_mtime_nsecs ... = 129602175 - cur_check_time .... = 2018-05-28 11:48:14 - cur_mtime ......... = 2018-05-28 11:48:01 - cur_mtime_nsecs.... = 277650283 - uidlist_mtime ..... = 2018-05-28 11:47:58 - uidlist_mtime_nsecs = 521588557 - uidlist_size ...... = 1334026 -- Extension 1 -- name ........ = keywords hdr_size .... = 292 reset_id .... = 0 record_offset = 5 record_size . = 3 record_align = 1 -- Extension 2 -- name ........ = hdr-vsize hdr_size .... = 16 reset_id .... = 0 record_offset = 0 record_size . = 0 record_align = 8 header - highest uid . = 0 - message count = 0 - vsize ....... = 0 -- Extension 3 -- name ........ = cache hdr_size .... = 0 reset_id .... = 1527164233 record_offset = 8 record_size . = 4 record_align = 4 -- Extension 4 -- name ........ = vsize hdr_size .... = 0 reset_id .... = 0 record_offset = 12 record_size . = 4 record_align = 4 -- Extension 5 -- name ........ = fts hdr_size .... = 12 reset_id .... = 0 record_offset = 0 record_size . = 0 record_align = 0 header - last_indexed_uid ..... = 30103 - settings_checksum .... = 0 -- Keywords -- 0 = unknown-1 1 = unknown-2 2 = unknown-5 3 = unknown-8 4 = unknown-6 5 = unknown-3 6 = unknown-7 7 = unknown-0 8 = NonJunk 9 = $Forwarded 10 = $label1 11 = $label2 12 = Junk 13 = $label3 14 = $label4> > Especially what is in the "fts" header vs. "next uid" header? Does the UID in "fts" header keep changing every time you save a new mail? I suppose it will.Diff between 2 emails: next uid = 30104 | next uid = 30105 last_indexed_uid = 30103 | last_indexed_uid = 30104> You could also monitor (e.g. tcpdump/wireshark) the network traffic between Dovecot <-> Solr what happens when a new mail arrives. I suspect Dovecot sends it to Solr, which for whatever reason just ignores the update.################### TCPDUMP #################################### POST /solr/dovecot/update HTTP/1.1 Host: localhost:8983 Date: Mon, 28 May 2018 10:18:05 GMT Transfer-Encoding: chunked Connection: Keep-Alive Content-Type: text/xml <add><doc><field name="uid">37581</field><field name="box">e0c58a3093235153110800003ea484a8</field><field name="user">username</field><field name="id">37581/e0c58a3093235153110800003ea484a8/username</field><field name="body">Search Pattern: Kai8oovi -- username </field><field name="from">"username" <username at mydomain> </field><field name="hdr">From: "username" <username at mydomain> Subject: New search test: 4 To: "username" <username at mydomain> Message-ID: <86442b87-aaad-c6bc-86f6-a74551a564c6 at mydomain> Date: Mon, 28 May 2018 12:17:41 +0200 User-Agent: _ MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr-xx-classique+reforme1990 Content-Transfer-Encoding: 7bit </field><field name="subject">New search test: 4 </field><field name="to">"username" <username at mydomain> </field></doc></add> ######### SOLR'S RESPONSE ############### <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> </lst> </response> ################### /TCPDUMP ####################################
Timo Sirainen
2018-May-28 11:04 UTC
Bug: Dovecot index loosing sync with FTS despite "fts_autoindex = yes"
On 28 May 2018, at 13.28, kfx <kadafax at gmail.com> wrote:> > >> Especially what is in the "fts" header vs. "next uid" header? Does the UID in "fts" header keep changing every time you save a new mail? I suppose it will. > > Diff between 2 emails: > next uid = 30104 | next uid = 30105 > last_indexed_uid = 30103 | last_indexed_uid = 30104So Dovecot thinks it has indexed everything.>> You could also monitor (e.g. tcpdump/wireshark) the network traffic between Dovecot <-> Solr what happens when a new mail arrives. I suspect Dovecot sends it to Solr, which for whatever reason just ignores the update. > > ################### TCPDUMP #################################### > POST /solr/dovecot/update HTTP/1.1 > Host: localhost:8983 > Date: Mon, 28 May 2018 10:18:05 GMT > Transfer-Encoding: chunked > Connection: Keep-Alive > Content-Type: text/xml > > <add><doc><field name="uid">37581</field><field name="box">e0c58a3093235153110800003ea484a8</field><field name="user">username</field><field name="id">37581/e0c58a3093235153110800003ea484a8/username</field><field name="body">Search Pattern: Kai8oovi..> </field></doc></add>And Dovecot sends the mail.> ######### SOLR'S RESPONSE ############### > <?xml version="1.0" encoding="UTF-8"?> > <response> > > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">0</int> > </lst> > </response>And Solr receives it. Your tcpdump doesn't show <commit softCommit="true" waitSearcher="true"/> being sent though. Do you see it being sent anywhere? Does it make the mails visible if you run it yourself? Or if you run hard commit? : curl http://<host>:8983/solr/update?commit=true -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20180528/ae857324/attachment-0001.html>
Possibly Parallel Threads
- Bug: Dovecot index loosing sync with FTS despite "fts_autoindex = yes"
- Bug: Dovecot index loosing sync with FTS despite "fts_autoindex = yes"
- Bug: Dovecot index loosing sync with FTS despite "fts_autoindex = yes"
- Solr connection timeout hardwired to 60s
- Bug: Dovecot index loosing sync with FTS despite "fts_autoindex = yes"