On Thu, Jul 12, 2007 at 10:18:07AM +0800, Gea-Suan Lin
wrote:> We use Perl module Search::Xapian 1.0.2.0 to index ~4m articles (it's
> 26GB right now), but updating is slow. (about 4 article/sec with I/O
> bound)
What spec is the machine?
Are you setting XAPIAN_FLUSH_THRESHOLD?
> The articles are UTF-8 CJK, we use bigram to generate terms, so it's
> very easy to generate ~10k terms for a mid-size article. The article
> itself is not stored in Xapian, but only the terms.
That is a lot more terms than is typical, so I'd expect indexing to be
slower, but 4 per second is very slow.
Cheers,
Olly