> It depends on what you consider reasonable. > > The processing time of file operation that iterates through a mailbox > will generally go up proportinately with size.? If you do a text search > without some indexing system like Solr, it will take a very long time. > > If the mailbox is just some archive that you pile up and forget about it > except for once in a blue moon retrieval, then it might be reasonable. > > If it's an active mailbox, it will be a pain to navigate, in the same > way a single folder with 100K files or a file cabinet with huge stacks > of envelopes. > > I would guess some partioning of the large mailboxes into smaller > mailboxes would help with active mailboxes.? Most people spend most of > their time on new/recent messages, so making time or size or subject > based volmes wouldn't be a bad idea. > > If the bulk of the size are redundant copies of attachments, then > Dovecot's > *dbox support de-duping which would aso help. >So, generally speaking, you don't want to have inboxes that just sync all day long, due to massive amounts of small files in the inbox.? This may be OK in the case of a rarely accessed archive folder, but not good for regularly accessed inboxes, etc.?
On Fri, 8 May 2020, asai at globalchangemusic.org wrote:> >> It depends on what you consider reasonable. >> >> The processing time of file operation that iterates through a mailbox >> will generally go up proportinately with size. If you do a text search >> without some indexing system like Solr, it will take a very long time. >> >> If the mailbox is just some archive that you pile up and forget about it >> except for once in a blue moon retrieval, then it might be reasonable. >> >> If it's an active mailbox, it will be a pain to navigate, in the same >> way a single folder with 100K files or a file cabinet with huge stacks >> of envelopes. >> >> I would guess some partioning of the large mailboxes into smaller >> mailboxes would help with active mailboxes. Most people spend most of >> their time on new/recent messages, so making time or size or subject >> based volmes wouldn't be a bad idea. >> >> If the bulk of the size are redundant copies of attachments, then Dovecot's >> *dbox support de-duping which would aso help. >> > > So, generally speaking, you don't want to have inboxes that just sync all day > long, due to massive amounts of small files in the inbox. This may be OK in > the case of a rarely accessed archive folder, but not good for regularly > accessed inboxes, etc.? > > > >Joseph Tam <jtam.home at gmail.com>
On Fri, 8 May 2020, Joseph Tam wrote:>>> It depends on what you consider reasonable.Whoops. Editing error. What I wanted to send. On Fri, 8 May 2020, asai at globalchangemusic.org wrote:> So, generally speaking, you don't want to have inboxes that just sync all day > long, due to massive amounts of small files in the inbox.I don't know enough about what is involved when your client tries to sync to comment on your particular situation. If the exchange of information involves only delta changes (e.g. list datum that have been added/removed since the last sync), and if this information is readily available in Dovecot's caches, then this operation might be optimized to take minimal time. If however, it involves exchanging entire lists of many messages IDs, or worse, involves Dovecot accessing each message, it will result in large amounts of time spent in I/O (network, disk or both). With Maildir (many small message in a folder), this causes seeking all over the disk. Some filesystems (XFS?) may be better at this than others. The description of your problem seems to suggest the latter, so breaking up gigantic mailboxes into manageable volumes will help. If you really want to see what's going on when a client syncs, you can network trace, process trace, or use Dovecot's rawlog feature https://wiki.dovecot.org/Debugging/Rawlog to directly observe the iteraction between a server and client.> This may be OK in the case of a rarely accessed archive folder, but not > good for regularly accessed inboxes, etc.?This is not really so much technical advice as a rule of thumb: there's not a lot of payoff to optimizing rare operations. Joseph Tam <jtam.home at gmail.com>
On 08 May 2020, at 12:54, asai at globalchangemusic.org wrote:>> It depends on what you consider reasonable. >> >> The processing time of file operation that iterates through a mailbox >> will generally go up proportinately with size. If you do a text search >> without some indexing system like Solr, it will take a very long time. >> >> If the mailbox is just some archive that you pile up and forget about it >> except for once in a blue moon retrieval, then it might be reasonable. >> >> If it's an active mailbox, it will be a pain to navigate, in the same >> way a single folder with 100K files or a file cabinet with huge stacks >> of envelopes. >> >> I would guess some partioning of the large mailboxes into smaller >> mailboxes would help with active mailboxes. Most people spend most of >> their time on new/recent messages, so making time or size or subject >> based volmes wouldn't be a bad idea. >> >> If the bulk of the size are redundant copies of attachments, then Dovecot's >> *dbox support de-duping which would aso help. > > So, generally speaking, you don't want to have inboxes that just sync all day long, due to massive amounts of small files in the inbox. This may be OK in the case of a rarely accessed archive folder, but not good for regularly accessed inboxes, etc.?Not really since most GUI clients keep all the folders synced, so moving files to different, smaller count mailboxes doesn?t reduce the number of files accessed. The issue is if you have a folder with millions of files in it, most file systems don?t deal well with this. But with mbox, each ?folder? is a single file, and making a single multi-GB text file that has to be parsed is a definitely issue on any file system. -- ALL WORK AND NO PLAY MAKES BART A DULL BOY ALL WORK AND NO PLAY MAKES BART A DULL BOY ALL WORK AND NO PLAY MAKES BART A DULL BOY Bart chalkboard Ep. 1F07