I've been trying to convert an existing set of mbox mail to maildir using dsync, but it seems to fail when it gets to the large boxes. $ dsync -Dv mirror maildir:~/Maildir Everything seems to work reasonably well until the first of the large boxes hits: dsync(jeff): Warning: Maildir /home/jeff/Maildir/.cron.2010-07-10: Synchronization took 1210 seconds (102289 new msgs, 0 flag change attempts, 0 expunge attempts) dsync(jeff): Warning: Transaction log file /home/jeff/Maildir/.cron.2010-07-10/dovecot.index.log was locked for 1210 seconds After this, the doveadm process goes from ~75% of the CPU core utilization to 100% and stays there, even overnight. I've also tried $ dsync -Dv backup maildir:~/Maildir with similar results. While I did see some threads on dsync speed, they seemed to be minutes for a few GB of data, not hours. There is plenty of space on the drive, and, from what I understand, there is not a file-count limit on FreeBSD's ufs (at least not one that I'm anywhere near) Any suggestions? Thanks! Jeff [jeff at mail ~]$ du -h -d 0 ~/mail 3.9G /home/jeff/mail [jeff at mail ~]$ ls -lh /var/mail/jeff -rw------- 1 jeff jeff 554M Apr 23 11:27 /var/mail/jeff [jeff at mail ~]$ df -h Filesystem Size Used Avail Capacity Mounted on /dev/ada0s2g 38G 18G 16G 53% / (Note that the hostname and the location of the ssl certificate has been manually edited for this message in the following output) From dmesg.boot: Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.0-STABLE #0: Sat Apr 21 17:13:07 PDT 2012 root at my.host.name:/usr/obj/usr/src/sys/GENERIC i386 CPU: Intel(R) Atom(TM) CPU 330 @ 1.60GHz (1600.03-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x106c2 Family = 6 Model = 1c Stepping = 2 Features=0xbfe9fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x40e31d<SSE3,DTES64,MON,DS_CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE> AMD Features=0x20100000<NX,LM> AMD Features2=0x1<LAHF> TSC: P-state invariant, performance statistics real memory = 2147483648 (2048 MB) avail memory = 2077757440 (1981 MB) [jeff at mail ~]$ doveconf -n # 2.1.3: /usr/local/etc/dovecot/dovecot.conf # OS: FreeBSD 9.0-STABLE i386 auth_mechanisms = plain login disable_plaintext_auth = no first_valid_gid = 1000 first_valid_uid = 1000 listen = * mail_location = mbox:~/mail/:INBOX=/var/mail/%u mail_privileged_group = mail namespace { inbox = yes location prefix separator = / } passdb { args = session=yes dovecot driver = pam } protocols = imap service auth { unix_listener /var/spool/postfix/private/auth { group = postfix mode = 0660 user = postfix } unix_listener auth-master { mode = 0600 } user = root } service imap-login { inet_listener imap { port = 0 } process_limit = 200 } ssl = required ssl_cert = </path/to/certificate.pem ssl_key = </path/to/certificate.pem userdb { args = blocking=yes driver = passwd } verbose_proctitle = yes protocol imap { imap_client_workarounds = tb-extra-mailbox-sep }
On 4/23/12 12:56 PM, Jeff Kletsky wrote:> I've been trying to convert an existing set of mbox mail to maildir > using dsync, but it seems to fail when it gets to the large boxes. > > $ dsync -Dv mirror maildir:~/Maildir > > Everything seems to work reasonably well until the first of the large > boxes hits: > > dsync(jeff): Warning: Maildir /home/jeff/Maildir/.cron.2010-07-10: > Synchronization took 1210 seconds (102289 new msgs, 0 flag change > attempts, 0 expunge attempts) > dsync(jeff): Warning: Transaction log file > /home/jeff/Maildir/.cron.2010-07-10/dovecot.index.log was locked for > 1210 seconds > > After this, the doveadm process goes from ~75% of the CPU core > utilization to 100% and stays there, even overnight. >I believe that some of problem may be that there were a handful of boxes named with "." in them. These were not "flagged" in the conversion process in any way. http://wiki2.dovecot.org/Migration/MailFormat indicates that you can't have a "." in the name of a Maildir box (I dutifully stopped reading that page at "If you can successfully use dsync, you can skip the rest of this page.") In my case, it may be a contributing factor that there were boxes that ended up "stacking" inside each other: * cron * cron.2010-07-10 * cron.2010-07-10.remaining Still not blisteringly fast, but dsync(jeff): Warning: Maildir /home/jeff/Maildir/.cron_2010-07-10: Synchronization took 686 seconds (102289 new msgs, 0 flag change attempts, 0 expunge attempts) dsync(jeff): Warning: Transaction log file /home/jeff/Maildir/.cron_2010-07-10/dovecot.index.log was locked for 686 seconds is certainly better. I'm still seeing 100% CPU utilization, with the tail of ls -Altr in ~/Maildir showing drwx------ 5 jeff jeff 512 Apr 23 18:19 .XDA drwx------ 5 jeff jeff 512 Apr 23 18:19 .Spam drwx------ 5 jeff jeff 512 Apr 23 18:42 .Health drwx------ 5 jeff jeff 512 Apr 23 18:42 .cron_2010-07-10_remaining drwx------ 5 jeff jeff 512 Apr 23 18:58 .cron_2010-07-10 It appears from looking in the .cron_2010-07-10 directory that indexing is the action underway [jeff at mail ~/Maildir]$ ls -Altr .cron_2010-07-10 total 27 drwx------ 2 jeff jeff 512 Apr 23 18:15 new -rw------- 1 jeff jeff 0 Apr 23 18:15 maildirfolder drwx------ 2 jeff jeff 5819392 Apr 23 18:42 tmp drwx------ 2 jeff jeff 5237248 Apr 23 18:42 cur -rw------- 1 jeff jeff 4122971 Apr 23 18:42 dovecot-uidlist -rw------- 1 jeff jeff 36 Apr 23 18:42 dovecot-keywords -rw------- 1 jeff jeff 1650764 Apr 23 18:42 dovecot.index.log.2 -rw------- 1 jeff jeff 1637008 Apr 23 18:42 dovecot.index -rw------- 1 jeff jeff 56124 Apr 23 19:04 dovecot.index.log -rw------- 1 jeff jeff 9442304 Apr 23 19:04 dovecot.index.cache -rw------- 1 jeff jeff 24 Apr 23 19:04 dovecot-uidlist.lock (time passes) It is now 20:35 and it has not advanced. I now no longer believe the problem is the mailbox naming, but the indexing. PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 25934 jeff 1 103 0 135M 102M CPU2 2 128:47 100.00% doveadm [jeff at mail ~/Maildir]$ ls -Altr .cron_2010-07-10 total 28 drwx------ 2 jeff jeff 512 Apr 23 18:15 new -rw------- 1 jeff jeff 0 Apr 23 18:15 maildirfolder drwx------ 2 jeff jeff 5819392 Apr 23 18:42 tmp drwx------ 2 jeff jeff 5237248 Apr 23 18:42 cur -rw------- 1 jeff jeff 4122971 Apr 23 18:42 dovecot-uidlist -rw------- 1 jeff jeff 36 Apr 23 18:42 dovecot-keywords -rw------- 1 jeff jeff 1650764 Apr 23 18:42 dovecot.index.log.2 -rw------- 1 jeff jeff 1637008 Apr 23 20:26 dovecot.index -rw------- 1 jeff jeff 289120 Apr 23 20:36 dovecot.index.log -rw------- 1 jeff jeff 9442304 Apr 23 20:36 dovecot.index.cache -rw------- 1 jeff jeff 24 Apr 23 20:36 dovecot-uidlist.lock Any suggestions as to how to proceed? Jeff
On 24.4.2012, at 6.38, Jeff Kletsky wrote:> 25934 jeff 1 103 0 135M 102M CPU2 2 128:47 100.00% doveadmLooks like it could be in an infinite loop. To find out where: 1) strace for a few seconds: strace -tt -p 25934 2>log 2) get gdb backtrace a couple of times (2-5 should be enough, preferably slightly different ones): gdb /usr/local/bin/doveadm 25934 bt full <copy&paste the output to mail> cont <ctrl-c> bt full <copy&paste the output to mail> cont <ctrl-c> ..etc..