On Wed, Oct 15, 2008 at 02:16:15PM +0200, Jeroen van Dijk
wrote:> The indexing process got to 1.2 million records and then it lost the
> connection (my own fault i guess) after 16 hours and had built up an
> indexing database of around 300mb.
>
> Should I be suspicious or should I just wait a little longer?
That seems rather slow. It depends on the data and the hardware, but
I'd expect more like a million documents per hour.
If you aren't already, try setting XAPIAN_FLUSH_THRESHOLD in the
environment to a value higher than the default of 10000. The best value
depends on the nature of the data and how much memory you have, but
1000000 is worth a try.
I've just realised that we don't actually seem to document
XAPIAN_FLUSH_THRESHOLD anywhere, which probably explains why I have to
keep highlighting it on the mailing list! I'll write up something...
Cheers,
Olly