Hi, In continuation to the issue I had posted about long back: http://www.dovecot.org/list/dovecot/2014-August/097362.html I did further testing today on a fresh new Debian & latest Dovecot and observed a undesired behavior. I am using fts_lucene & following sequence of commands on a empty test account me at myself.com: doveadm expunge -u 'my at myself.com' mailbox 'INBOX' all cat test.eml | /usr/lib/dovecot/dovecot-lda -e -f you at yourself.com -d me at myself.com doveadm search -u 'akash at mailjol.in' mailbox 'INBOX' text ABCD Search command does or doesn't find the email with slight variation in the content of test.eml. Here are the results: test.eml content: ----------------------------- From: you at yourself.com To: me at myself.com Subject: Test Message Content-Type: text/html <div id="mydiv">ABCD 1234</div> ----------------------------- RESULT: OK. The email is found. test.eml content (double quotes inside div tag replaced with single): ----------------------------- From: you at yourself.com To: me at myself.com Subject: Test Message Content-Type: text/html <div id='mydiv'>ABCD 1234</div> ----------------------------- RESULT: None. The email isn't found. test.eml content (single quotes in div but content/type header removed): ----------------------------- From: you at yourself.com To: me at myself.com Subject: Test Message <div id='mydiv'>ABCD 1234</div> ----------------------------- RESULT: OK. The email is found. What could be the reason for this? -Akash
The issue is probably linked to: http://www.dovecot.org/list/dovecot-cvs/2014-May/024462.html But that change-set was in 2014 and I am using Dovecot 2.2.19 so don't understand why I am still seeing this behavior. -Akash
Tried latest source from HG and with solr also apart from lucene which I tested previously. The problem with single quotes in HTML is still there. The revision: http://hg.dovecot.org/dovecot-2.2/rev/ad028a950248 should have solved it but the relevant code no longer exists in src/plugins/fts/fts-parser-html.c. Seems like it has been moved into lib-mail. The file src/lib-mail/mail-html2text.c does contain something about single quotes but to no avail. Can someone at-least confirm existence of this issue?
On Wed, Oct 14, 2015 at 08:33:56PM +0530, Akash wrote:> Tried latest source from HG and with solr also apart from lucene which I > tested previously. The problem with single quotes in HTML is still there. > > The revision: > > http://hg.dovecot.org/dovecot-2.2/rev/ad028a950248 > > should have solved it but the relevant code no longer exists in > src/plugins/fts/fts-parser-html.c. Seems like it has been moved into > lib-mail. The file src/lib-mail/mail-html2text.c does contain something > about single quotes but to no avail. Can someone at-least confirm existence > of this issue?Thanks for the report. Bug found. My bad. A patch is working its way through the internal process, and will be in the public tree soon. Cheers, Phil
Possibly Parallel Threads
- Dovecot v2.2 FTS is not indexing "text/html" emails...
- Strange indexing behavior on HTML emails ..
- Segmentation fault while indexing a large mailbox using doveadm..
- Indexing fails with .. FIELDS_INDEX_EXTENSION).c_str() )' failed
- Order in which UIDs are assigned..