Hi all, I'm sure my issues are a result of misconfiguration, but I'm hoping someone can point me in the right direction. I'm getting pressure to move us back to GroupWise, which I desperately want to avoid :-/ We're running dovecot 1.2.9 on Ubuntu 10.4 LTS+postfix. The server is a VM with 1 vCPU and 4GB of RAM. We serve about 10,000 users with anywhere from 500-1000 logged in at any one time. Messages are stored in Maildir format on two NFS servers (one for staff, the other for students). Today I implemented the "High performance" setup described here: http://wiki.dovecot.org/NFS (mainly moving indexes off of NFS, since I'm only using the one server). I also added imapproxy to our webmail client server (SOGo). The vast majority of our users come in over the web. We currently see load averages spiking into the 20-30 range. When this happens, service crawls to a near standstill, and ultimately the SOGo client starts crashing out. I'm wondering if anything jumps out at anybody here - feel free to mock if/when you find an obvious configuration problem. I just want it to work :-) dovecot -n # 1.2.9: /etc/dovecot/dovecot.conf # OS: Linux 2.6.32-25-server x86_64 Ubuntu 10.04.1 LTS log_timestamp: %Y-%m-%d %H:%M:%S protocols: imap imaps pop3 pop3s managesieve listen(default): * listen(imap): * listen(pop3): * listen(managesieve): *:2000 ssl_cert_file: /etc/dovecot/certs/mail_nhusd_k12_ca_us.crt ssl_key_file: /etc/dovecot/certs/mail_nhusd_k12_ca_us.key disable_plaintext_auth: no login_dir: /var/run/dovecot/login login_executable(default): /usr/lib/dovecot/imap-login login_executable(imap): /usr/lib/dovecot/imap-login login_executable(pop3): /usr/lib/dovecot/pop3-login login_executable(managesieve): /usr/lib/dovecot/managesieve-login login_process_per_connection: no login_process_size: 512 login_processes_count: 20 login_max_processes_count: 3000 login_max_connections: 64 max_mail_processes: 2048 mail_max_userip_connections(default): 20 mail_max_userip_connections(imap): 20 mail_max_userip_connections(pop3): 10 mail_max_userip_connections(managesieve): 10 mail_access_groups: staffmailusers mail_privileged_group: dovecot mail_uid: mail mail_gid: 502 mail_location: maildir:~/Maildir:INDEX=/var/indexes/%u mail_nfs_storage: yes mbox_write_locks: fcntl dotlock mail_executable(default): /usr/lib/dovecot/imap mail_executable(imap): /usr/lib/dovecot/imap mail_executable(pop3): /usr/lib/dovecot/pop3 mail_executable(managesieve): /usr/lib/dovecot/managesieve mail_plugins(default): acl imap_acl quota imap_quota expire mail_plugins(imap): acl imap_acl quota imap_quota expire mail_plugins(pop3): mail_plugins(managesieve): mail_plugin_dir(default): /usr/lib/dovecot/modules/imap mail_plugin_dir(imap): /usr/lib/dovecot/modules/imap mail_plugin_dir(pop3): /usr/lib/dovecot/modules/pop3 mail_plugin_dir(managesieve): /usr/lib/dovecot/modules/managesieve namespace: type: private separator: / inbox: yes list: yes subscriptions: yes namespace: type: shared separator: / prefix: shared/%%u/ location: maildir:%%h/Maildir:INDEX=~/Maildir/shared/%%u list: children lda: deliver_log_format: %$ -- FROM=%f SUBJECT=%s mail_plugins: cmusieve acl expire log_path: info_log_path: syslog_facility: mail postmaster_address:postmaster at nhusd.k12.ca.us hostname: mail.nhusd.k12.ca.us auth_socket_path: /var/run/dovecot/auth-master auth default: passdb: driver: pam passdb: driver: ldap args: /etc/dovecot/dovecot-ldap.conf userdb: driver: ldap args: /etc/dovecot/dovecot-ldap.conf socket: type: listen master: path: /var/run/dovecot/auth-master mode: 384 plugin: quota: maildir:User quota quota_rule: *:storage=9G quota_rule2: Trash:storage=200M acl: vfile acl_shared_dict:file:/home/staff/dovecot/shared-mailboxes expire: Trash 7 Trash/* 7 Spam 30 expire_dict: proxy::expire sieve: ~/.dovecot.sieve sieve_dir: ~/sieve sieve_extensions: +imapflags dict: expire: mysql:/etc/dovecot/dovecot-dict-expire.conf -- Chris Hobbs Director, Technology New Haven Unified School District -- This message was scanned by ESVA and is believed to be clean.
On 7.10.2010, at 0.32, Chris Hobbs wrote:> We currently see load averages spiking into the 20-30 range. When this happens, service crawls to a near standstill, and ultimately the SOGo client starts crashing out.Is the load CPU load or disk I/O load? If I/O load, what NFS operations are peaking there, or all of them? Pretty graphs of nfsstat output would be nice.> login_processes_count: 20Probably could use less then 20.> login_max_connections: 64And this could be higher. In general you should have maybe 1-2x the number of login processes than CPU cores.> mail_nfs_storage: yesYou said you have only one server accessing mails. So set this to "no".> mail_location: maildir:~/Maildir:INDEX=/var/indexes/%u..> namespace: > type: shared > separator: / > prefix: shared/%%u/ > location: maildir:%%h/Maildir:INDEX=~/Maildir/shared/%%uThe INDEX path here is wrong now. Also you could try if maildir_very_dirty_syncs=yes is helpful.
imapproxy can only take you from "doesn't work" to "might as well not work", ime. If at all possible look into a stateful web client. -bdh On Oct 6, 2010, at 6:32 PM, Chris Hobbs <chobbs at nhusd.k12.ca.us> wrote:> Hi all, > > I'm sure my issues are a result of misconfiguration, but I'm hoping someone can point me in the right direction. I'm getting pressure to move us back to GroupWise, which I desperately want to avoid :-/ > > We're running dovecot 1.2.9 on Ubuntu 10.4 LTS+postfix. The server is a VM with 1 vCPU and 4GB of RAM. We serve about 10,000 users with anywhere from 500-1000 logged in at any one time. Messages are stored in Maildir format on two NFS servers (one for staff, the other for students). > > Today I implemented the "High performance" setup described here: http://wiki.dovecot.org/NFS (mainly moving indexes off of NFS, since I'm only using the one server). > > I also added imapproxy to our webmail client server (SOGo). The vast majority of our users come in over the web. > > We currently see load averages spiking into the 20-30 range. When this happens, service crawls to a near standstill, and ultimately the SOGo client starts crashing out. > > I'm wondering if anything jumps out at anybody here - feel free to mock if/when you find an obvious configuration problem. I just want it to work :-) > > dovecot -n > > # 1.2.9: /etc/dovecot/dovecot.conf > # OS: Linux 2.6.32-25-server x86_64 Ubuntu 10.04.1 LTS > log_timestamp: %Y-%m-%d %H:%M:%S > protocols: imap imaps pop3 pop3s managesieve > listen(default): * > listen(imap): * > listen(pop3): * > listen(managesieve): *:2000 > ssl_cert_file: /etc/dovecot/certs/mail_nhusd_k12_ca_us.crt > ssl_key_file: /etc/dovecot/certs/mail_nhusd_k12_ca_us.key > disable_plaintext_auth: no > login_dir: /var/run/dovecot/login > login_executable(default): /usr/lib/dovecot/imap-login > login_executable(imap): /usr/lib/dovecot/imap-login > login_executable(pop3): /usr/lib/dovecot/pop3-login > login_executable(managesieve): /usr/lib/dovecot/managesieve-login > login_process_per_connection: no > login_process_size: 512 > login_processes_count: 20 > login_max_processes_count: 3000 > login_max_connections: 64 > max_mail_processes: 2048 > mail_max_userip_connections(default): 20 > mail_max_userip_connections(imap): 20 > mail_max_userip_connections(pop3): 10 > mail_max_userip_connections(managesieve): 10 > mail_access_groups: staffmailusers > mail_privileged_group: dovecot > mail_uid: mail > mail_gid: 502 > mail_location: maildir:~/Maildir:INDEX=/var/indexes/%u > mail_nfs_storage: yes > mbox_write_locks: fcntl dotlock > mail_executable(default): /usr/lib/dovecot/imap > mail_executable(imap): /usr/lib/dovecot/imap > mail_executable(pop3): /usr/lib/dovecot/pop3 > mail_executable(managesieve): /usr/lib/dovecot/managesieve > mail_plugins(default): acl imap_acl quota imap_quota expire > mail_plugins(imap): acl imap_acl quota imap_quota expire > mail_plugins(pop3): > mail_plugins(managesieve): > mail_plugin_dir(default): /usr/lib/dovecot/modules/imap > mail_plugin_dir(imap): /usr/lib/dovecot/modules/imap > mail_plugin_dir(pop3): /usr/lib/dovecot/modules/pop3 > mail_plugin_dir(managesieve): /usr/lib/dovecot/modules/managesieve > namespace: > type: private > separator: / > inbox: yes > list: yes > subscriptions: yes > namespace: > type: shared > separator: / > prefix: shared/%%u/ > location: maildir:%%h/Maildir:INDEX=~/Maildir/shared/%%u > list: children > lda: > deliver_log_format: %$ -- FROM=%f SUBJECT=%s > mail_plugins: cmusieve acl expire > log_path: > info_log_path: > syslog_facility: mail > postmaster_address:postmaster at nhusd.k12.ca.us > hostname: mail.nhusd.k12.ca.us > auth_socket_path: /var/run/dovecot/auth-master > auth default: > passdb: > driver: pam > passdb: > driver: ldap > args: /etc/dovecot/dovecot-ldap.conf > userdb: > driver: ldap > args: /etc/dovecot/dovecot-ldap.conf > socket: > type: listen > master: > path: /var/run/dovecot/auth-master > mode: 384 > plugin: > quota: maildir:User quota > quota_rule: *:storage=9G > quota_rule2: Trash:storage=200M > acl: vfile > acl_shared_dict:file:/home/staff/dovecot/shared-mailboxes > expire: Trash 7 Trash/* 7 Spam 30 > expire_dict: proxy::expire > sieve: ~/.dovecot.sieve > sieve_dir: ~/sieve > sieve_extensions: +imapflags > dict: > expire: mysql:/etc/dovecot/dovecot-dict-expire.conf > > > > -- > Chris Hobbs > Director, Technology > New Haven Unified School District > > -- > This message was scanned by ESVA and is believed to be clean. >
Chris,> -----Original Message----- > Subject: [Dovecot] Significant performance problems > > I'm sure my issues are a result of misconfiguration, but I'm hoping > someone can point me in the right direction. I'm getting pressure to > move us back to GroupWise, which I desperately want to avoid :-/ > > We're running dovecot 1.2.9 on Ubuntu 10.4 LTS+postfix. The server isa> VM with 1 vCPU and 4GB of RAM. We serve about 10,000 users withanywhere> from 500-1000 logged in at any one time. Messages are stored inMaildir> format on two NFS servers (one for staff, the other for students).Is the webmail interface and imap proxy also running on this server? What does memory utilization look like on the server? How much is being used by applications, and how much is free for filesystem cache? What mount options are you using on your NFS exports (on the NFS client side)? We run 60k accounts with about 10k concurrent sessions across 12 servers. Each server has 4 cores and 8GB of RAM, and mounts 16 NFS exports spread across two servers. The servers handle close to 1k concurrent sessions each without breaking a load of 1. The keys seem to be keeping NFS IO latency down, and allowing the server to cache as much as possible. If the Dovecot server is always having to go back to NFS for client data, and the NFS server doesn't have enough memory to cache filesystem metadata and/or spindles to access the data in a timely manner, you're going to hit a pain point pretty quick. Try bumping up the RAM on both servers to 8+GB, and make sure that you don't have any mount options that would prevent the client from caching data - noac for example is a killer. You could also try mounting with noac, and disabling or turning down speculative readahead on the NFS server. Have you followed all of your storage vendor's block alignment guidelines when setting up the LUNs and virtual disks? -Brad --- Brandon 'Brad' Davidson Virtualization Systems Administrator University of Oregon Information Services (541) 346-8098 brandond at uoregon.edu
For documentation's sake, here's what I've done so far: 1) Implemented Timo's fixes to my config file (fixed shared INDEX, adjusted nfs settings for reality of only one server hitting it) 2) installed imapproxy on the webmail server at the recommendation of the developers of that product (SOGo) 3) Modified my NFS mount with noatime to reduce i/o hits there. Need to figure out what Brad's suggestions about readahead on the server mean. 4) Threw gobs of RAM at both the dovecot server (went from 4GB to 8) and the NFS server (from 1GB to 8). Also cranked up vCPUs on each to 4. I hope that's enough to get things working much better tomorrow morning. I'll be back to report or beg for more. I really appreciate the quick responses and helpful advice. I do have one more idea I'll throw out there. Everything I've got here is virtual. I only have the one Dovecot/Postfix server running now, and the impression I get from you all is that that should be adequate for my load. What would the collective opinion be of simply removing the NFS server altogether and mounting the virtual disk holding my messages directly to the dovecot server? I give up the ability to have a failover dovecot/postfix server, which was my motivation for using NFS in the first place, but a usable system probably trumps a redundant one. Chris On 10/6/10 4:32 PM, Chris Hobbs wrote:> Hi all, > > I'm sure my issues are a result of misconfiguration, but I'm hoping > someone can point me in the right direction. I'm getting pressure to > move us back to GroupWise, which I desperately want to avoid :-/-- Chris Hobbs Director, Technology New Haven Unified School District -- This message was scanned by ESVA and is believed to be clean.