Hi,
I am trying to use the fts_solr plugin, and having some
success. Unfortunately some spam messages I had lying around generate
an error from Solr, e.g.:
HTTP/1.1 400 ParseError at [row,col]:[5,29] Message: An invalid XML character
(Unicode: 0xd84e) was found in the element content of the document.
While I assume that these messages do indeed contain bad unicode, my
searches seem to hang when they get an error back from Solr, causing
other problems, so it would be nice if these messages did not cause an
error from Solr.
I isolated some of these messages and have attached them. I hope that
the problematic characters come through properly. I have more messages
that cause this problem (sometimes with difference unicode codepoints
than the one in the message above), but I don?t want to clog the list
up.
Thank you for your help.
best, Erik Hetzner
Config:
# 1.2.13: /etc/dovecot/dovecot.conf
# OS: Linux 2.6.32-23-generic i686 Ubuntu 10.04.1 LTS
log_timestamp: %Y-%m-%d %H:%M:%S
protocols: imaps
login_dir: /usr/local/stow/dovecot-1.2.13/var/run/dovecot/login
login_executable: /usr/local/stow/dovecot-1.2.13/libexec/dovecot/imap-login
mail_privileged_group: mail
mail_plugins: virtual fts fts_solr
namespace:
type: private
separator: /
location: maildir:~/Maildir
inbox: yes
list: yes
subscriptions: yes
namespace:
type: private
separator: /
prefix: virtual/
location: virtual:~/Maildir_virtual:LAYOUT=maildir++
list: yes
subscriptions: yes
auth default:
passdb:
driver: pam
userdb:
driver: passwd
plugin:
fts: solr
fts_solr: url=http://localhost:8080/solr/ break-imap-search debug
-------------- next part --------------
An embedded message was scrubbed...
From: "Yoshida Nozomi" <diwiuxttio at msn.com>
Subject: ???????????????v??????
Date: Fri, 02 Nov 2007 07:46:23 +0600
Size: 2185
URL:
<http://dovecot.org/pipermail/dovecot/attachments/20100820/2a7913da/attachment-0006.mht>
-------------- next part --------------
An embedded message was scrubbed...
From: "Efrain Mcleod" <qfijs at yahoo.com>
Subject: ?d???d???A???R??????????????
Date: Wed, 28 May 2008 13:00:06 -0400
Size: 2885
URL:
<http://dovecot.org/pipermail/dovecot/attachments/20100820/2a7913da/attachment-0007.mht>
-------------- next part --------------
An embedded message was scrubbed...
From: "Cecile Bravo" <pafxz at yahoo.com>
Subject: ???x?A?X?????????????????B
Date: Tue, 03 Jun 2008 03:04:34 +0300
Size: 2955
URL:
<http://dovecot.org/pipermail/dovecot/attachments/20100820/2a7913da/attachment-0008.mht>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL:
<http://dovecot.org/pipermail/dovecot/attachments/20100820/2a7913da/attachment-0002.bin>