Hi,
After some more digging it seems to have to do with capital Umlauts.
So when I index the UTF-8 String "?gypten", I can search for
"?gypten"
and for "?gypten" and get results for both searches.
But when I index the UTF-8 string "?gypten", I don't get any
results,
whether I search for "?gypten" or for "?gypten".
Is that a bug or am I missing something?
Cheers,
Johannes
On Mon, Jan 24, 2011 at 2:07 PM, Johannes Fahrenkrug
<jfahrenkrug at gmail.com> wrote:> Hi,
>
> I'm new to the list but I've been using Xapian along with the Ruby
> bindings and Xapit for over 1,5 years and it's working great. But now
> I've run into a very strange encoding issue.
>
> I'm using Xapian 1.0.11 on Solaris.
>
> This is the issue: I'm pulling ISO-8859-15 encoded data from a legacy
> database and I'm indexing it. Some of that data contains German Umlaut
> characters. When I search for those words, Xapian finds nothing. That
> should not surprise me since the docs say that Xapian expects UTF-8
> encoded strings. So I use Iconv to convert the strings from
> ISO-8859-15 to UTF-8 before I pass it to Xapian to be indexed: It
> still doesn't work. The weird thing is, however, that when I just put
> a UTF-8 string literal into my ruby code and return it in place of the
> actual string that should be indexed, it works. Even with Umlauts. So
> a UTF-8 String LITERAL works, but a UTF-8 String that has been
> converted from ISO-8859-15 does not.
>
> Does this sound familiar to anyone? Any help would be appreciated!
>
> - Johannes
>
> --
> springenwerk.com | github.com/jfahrenkrug | twitter.com/jfahrenkrug
>
--
springenwerk.com | github.com/jfahrenkrug | twitter.com/jfahrenkrug