On Tue, 2011-05-24 at 17:01 +0200, Cor Bosman wrote:> Hi all, ive been playing with squat indexes. Up to about 300.000 emails in
a single mailbox this was working flawlessly. The search index file is about
500MB at that time. Ive now added some more emails, and at 450.000 or so emails
im seeing a serious problem with squat index creation. It takes...f o r e v e r
. The .tmp file is being so slowly, it will probably take 2-3 hours to create.
Upto this point it took maybe a minute.
>
> Im doing this in an openvz container, so theoretically i may be hitting
some openvz resource limit. But ive upped all the limits and dont see any
improvements. I dont see any resources starvation either.
>
> Could there be some dovecot issue when the search index reaches say 1GB?
(im estimating that it's now trying to save about 1GB search index).
Initially squat just builds a large unorganized index. The last step is
the organization. This is the main problem with Squat's indexing speed.
The file is mmaped() and the accessed in pretty random order. As long as
you have enough memory to keep all of this mmaped data in physical
memory this works pretty fast, but otherwise the kernel starts page
faulting like crazy and it takes forever. That's why the Squat has this
code:
/* Tell the kernel we're going to use the uidlist data, so it loads
it into memory and keeps it there. */
(void)madvise(uidlist->mmap_base, uidlist->mmap_size, MADV_WILLNEED);
/* It also speeds up a bit for us to sequentially load everything
into memory, although at least Linux catches up quite fast even
without this code. Compiler can quite easily optimize away this
entire for loop, but volatile seems to help with gcc 4.2. */
for (i = 0; i < uidlist->mmap_size; i += page_size)
((const volatile char *)uidlist->data)[i];