Hi Eric!
On 15 Feb 2016, at 09:20, Eric Lindblad <geirfuglaps at yahoo.com> wrote:
> 2. Human Language Support
>
> Here is another [webpage Character Encoding (EUC-JP)] link regarding
software using mecab.
> http://www.asahi-net.or.jp/~yw3t-trns/namazu/index.htm
It doesn?t seem very active, so I don?t know how useful pointing at a different
search engine is actually going to be. (Also, it is licensed only as GPL, so
we?d want to be careful not to allow any code to migrate across; although it
looks like the code is in Perl, so that?s probably not a concern.) But if
there?s something helpful in there then please do add it to the resources list
for that project. A better link seems to be <http://www.namazu.org/>.
I have also noticed that the mecab link was outdated, and switched it to github,
where it does seem moderately active.
> Here is an article by Rob Pike and Ken Thompson (Plan 9).
> http://doc.cat-v.org/plan_9/4th_edition/papers/utf
Is that the one you mean? It seems to be a paper justifying choosing Unicode
over ISO 10646, which AIUI since Unicode 3.0 / 2000 is a moot distinction. The
paper also covers using a UTF (UTF-8 in particular, although I guess at the time
there wasn?t another one) over a UCS (which would have been UCS-2 at the time,
now long abandoned by most people).
I agree though that we could do with something general about Unicode / UTF-8
usage in the documentation (which we can then link to from projects where this
is relevant). We have per-language notes (such as
https://getting-started-with-xapian.readthedocs.org/en/latest/language_specific.html#unicode),
but nothing that I can recall that talks more generally about character sets,
serialisation and so forth.
> 7. Applications
>
> It might be interesting to use Xapian with 9P2000.
Do you want to add a project to the wiki for this? It sounds like you have an
idea of what it would look like, which at the moment I don?t (and I suspect a
number of potential students wouldn?t either).
J
--
James Aylett, occasional trouble-maker
xapian.org