On Fri, Sep 05, 2008 at 11:53:03AM +1000, cel tix44
wrote:> When running a simple indexing test, I noticed that Xapian generates a
> ~74 MB index database for ~24 MB of data.
>
> Is that the expected Index-To-Data size ratio?
It can vary quite a bit between data sets, but when indexing with
positions, they're usually around the same size with the flint backend.
> Is there a way to make the index smaller?
You might find that compacting the database with xapian-compact makes
a significant difference.
culling stopwords at index time can save quite a bit of space. Also,
filtering out "junk" terms can too in some applications - for example,
when indexing email, ASCII art in signatures doesn't produce useful
terms for searching on.
The development backend (chert) does quite a bit better too (the
postlist table is ~44% smaller for gmane).
Cheers,
Olly