Guys, We have a very large maildir for email auditing purposes. It's currently at 600 GB and continues to grow. Can dovecot handle this with squat indexing, or am I out of my mind? Thanks! John
On Mon, 2008-07-21 at 12:37 -0400, John Wells wrote:> Guys, > > We have a very large maildir for email auditing purposes. It's > currently at 600 GB and continues to grow. > > Can dovecot handle this with squat indexing, or am I out of my mind?You can try of course, but that might be a bit too much. :) I've only tested with a 1,4 GB mailbox and memory usage went somewhere like 700 MB I think. It would be nice if Squat was able to scale to infinitely large mailboxes, but I currently I don't really see how that would be possible. There are two issues here: 1) It needs to keep a trie in memory containing all the 4 character blocks of messages. If the input data doesn't contain all that much unique blocks perhaps this doesn't grow too large with 600 GB of data. Maybe this could be somehow changed so that the rarely used trie branches would be written to disk when memory usage gets too high. 2) Once the entire index is created Dovecot goes through it again and defragments all the pieces. This reduces the index size and speeds up lookups, but if the index doesn't fit entirely to memory this stage can take a really really long time. Originally I was thinking about dropping this stage since it seemed to take forever, but then I figured out that once I first sequentially read the entire index into memory before starting the defragmentation it would take a lot less time (with the 1,5 GB mailbox it dropped from somewhere around 10 mins -> 0,5 mins). But if your index is larger than what fits into memory, this sequential read is pointless. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20080721/f1f76968/attachment-0002.bin>
On Mon, Jul 21, 2008 at 12:50 PM, Timo Sirainen <tss at iki.fi> wrote:> On Mon, 2008-07-21 at 12:37 -0400, John Wells wrote: >> Guys, >> >> We have a very large maildir for email auditing purposes. It's >> currently at 600 GB and continues to grow. >> >> Can dovecot handle this with squat indexing, or am I out of my mind? > > You can try of course, but that might be a bit too much. :) I've only > tested with a 1,4 GB mailbox and memory usage went somewhere like 700 MB > I think.Aha...I see...I was under the mistake impression that this was a disk-based index. Given that squat seem unfeasible, can anyone recommend another approach? I'll look at Lucene integration, but if anyone knows of a dovecot way or of another tool that would do this effectively, commercial or open source, please let me know. Thanks! John