Looks like maildir can't be used very realiably without quite a lot of locking. Writing and scanning the directory would have to be locked, but reading wouldn't (as long as the file hasn't been renamed which would require scanning to find it). So much for "no locks needed".. The problem is that opendir()/readdir() may temporarily not return some files if there has been changes in the directory since the opendir(). That means Dovecot thinks a message is expunged, while in fact it really isn't, and the next scan would usually show it again. Currently when that happens, Dovecot usually prints an error message about it and rebuilds indexes. Of course, in real life clients aren't often bombing the same mailbox with tons of changes in multiple connections, which is usually needed to trigger this. I wrote a test program which tests this: http://dovecot.org/tmp/readdir.c I'd like to hear if you can run it in some system without errors. I tested Linux 2.4 and 2.6 with ext2, ext3, xfs and reiser3, Solaris 8/ufs and OpenBSD 3.5/sparc64. Only OpenBSD passed the test, but I'm not sure if it's only because the computer was so slow and didn't switch between processes hard enough. I'd be especially interested about FreeBSD and various NFS systems. If it actually works properly in some systems, I guess I'll make the extra locking configurable. -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20041025/e1498583/attachment-0001.bin>
Timo Sirainen wrote:> > http://dovecot.org/tmp/readdir.c > > I'd like to hear if you can run it in some system without errors. I > tested Linux 2.4 and 2.6 with ext2, ext3, xfs and reiser3, Solaris > 8/ufs and OpenBSD 3.5/sparc64. Only OpenBSD passed the test, but I'm > not sure if it's only because the computer was so slow and didn't > switch between processes hard enough. I'd be especially interested > about FreeBSD and various NFS systems.FreeBSD 4.10 stable: - To home directory NFS mounted from a NetApp: about 3000 error lines - To /tmp (ufs on ATA disk): 15 errors Solaris 8/NFS to the NetApp: - 470 errors
On Mon, 25 Oct 2004 03:02:31 +0300 Timo Sirainen <tss at iki.fi> wrote: :Looks like maildir can't be used very realiably without quite a lot of :locking. Writing and scanning the directory would have to be locked, :but reading wouldn't (as long as the file hasn't been renamed which :would require scanning to find it). So much for "no locks needed".. : :The problem is that opendir()/readdir() may temporarily not return some :files if there has been changes in the directory since the opendir(). :That means Dovecot thinks a message is expunged, while in fact it :really isn't, and the next scan would usually show it again. : :Currently when that happens, Dovecot usually prints an error message :about it and rebuilds indexes. Of course, in real life clients aren't :often bombing the same mailbox with tons of changes in multiple :connections, which is usually needed to trigger this. : :I wrote a test program which tests this: : :http://dovecot.org/tmp/readdir.c : :I'd like to hear if you can run it in some system without errors. I :tested Linux 2.4 and 2.6 with ext2, ext3, xfs and reiser3, Solaris :8/ufs and OpenBSD 3.5/sparc64. Only OpenBSD passed the test, but I'm :not sure if it's only because the computer was so slow and didn't :switch between processes hard enough. I'd be especially interested :about FreeBSD and various NFS systems. : :If it actually works properly in some systems, I guess I'll make the :extra locking configurable. : gir.theapt.org:phessler@/usr/home/phessler> ./readdir 28751: File re-appeared: -> 983:2,S (2) 4017: File re-appeared: -> 707:2,S (2) 4017: File re-appeared: -> 724:2, (2) 4017: File re-appeared: -> 800:2,S (2) 30179: File re-appeared: 429:2,S -> 429:2, (2) This is on OpenBSD/macppc -current, the partition is mounted as: /dev/wd0h on /usr/home type ffs (local, noatime, nodev, nosuid, softdep) -- These days the necessities of life cost you about three times what they used to, and half the time they aren't even fit to drink.
Hi, All of them failed on local (ffs, softdep) and NFS (hosted on OpenBSD 3.6) : OpenBSD 3.6 i386 OpenBSD 3.6 amd64 OpenBSD 3.1 i386 Darwin 7.5.0 Power Macintosh powerppc NetBSD 2.99.10 i386
On 25.10.2004, at 11:04, Peter Evans wrote:> Are you sure its maildir that is the issue, I am sure God Emperor DJB > would have issues with any such statement maligning the holy grail. > > The issue looks to me like it is attribute caching, and thus not > actually > specific to Maildir, just aggravated by it.Well, it's not maildir issue itself, but maildir more or less relies on readdir() not losing renamed files. I'm not sure what you mean by attribute caching, because it's not just with NFS. I'm sure if you point that out to DJB he just says it's not a real problem, because it doesn't happen hardly ever except with some stress test tools, or mailboxes with _many_ simultaneous users. Or maybe with a heavily loaded server. Or ... In any case, I'd rather make Dovecot resistant to this problem entirely rather than hope that everyone uses maildirs only in optimal conditions. -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20041025/dbcb7486/attachment-0001.bin>
On Mon, 2004-10-25 at 03:02 +0300, Timo Sirainen wrote:> Looks like maildir can't be used very realiably without quite a lot of > locking. Writing and scanning the directory would have to be locked, > but reading wouldn't (as long as the file hasn't been renamed which > would require scanning to find it). So much for "no locks needed".. >Here is the output from a Solaris 9 sparc box, which provides home directories and the imaps service for clients. Maildir is in the home directory. The maildir is mounted via nfs by mail clients. See attached. I do not know what this means. Is there a problem? Thanks, Alex -------------- next part -------------- [root at mcsrv5 /tmp]# /tmp/readdir 17394: File re-appeared: -> 346:2,S (2) 17394: File re-appeared: 384 -> 384:2,S (2) 17400: File re-appeared: 385 -> 385:2,S (2) 17401: File re-appeared: 678 -> 678:2,S (2) 17397: File re-appeared: 678 -> 678:2,S (2) 17390: File re-appeared: 678 -> 678:2,S (2) 17396: File re-appeared: 678 -> 678:2,S (2) 17402: File re-appeared: 678 -> 678:2,S (2) 17399: File re-appeared: 678 -> 678:2,S (2) 17393: File re-appeared: 678 -> 678:2,S (2) 17393: File re-appeared: 670 -> 670:2,S (2) 17401: File re-appeared: 373:2,S -> 373:2, (2) 17401: File re-appeared: 677 -> 677:2,S (2) 17402: File re-appeared: 677 -> 677:2,S (2) 17393: File re-appeared: 677 -> 677:2,S (2) 17396: File re-appeared: 677 -> 677:2,S (2) 17401: File re-appeared: 862:2,S -> 862:2, (2) 17396: File re-appeared: 862:2,S -> 862:2, (2) 17395: File re-appeared: 862:2,S -> 862:2, (2) 17402: File re-appeared: 862:2,S -> 862:2, (2) 17393: File re-appeared: 862:2,S -> 862:2, (2) 17393: File re-appeared: 868:2,S -> 868:2, (2) 17390: File re-appeared: 862:2,S -> 862:2, (2) 17402: File re-appeared: 868:2,S -> 868:2, (2) 17390: File re-appeared: 868:2,S -> 868:2, (2) 17402: File re-appeared: 828 -> 828:2,S (2) 17397: File re-appeared: 828 -> 828:2,S (2) 17400: File re-appeared: 828 -> 828:2,S (2) 17390: File re-appeared: 828 -> 828:2,S (2) 17401: File re-appeared: 828 -> 828:2,S (2) [root at mcsrv5 /tmp]#
On 2004.10.25 02:02, Timo Sirainen wrote:> Looks like maildir can't be used very realiably without quite a lot of > locking. Writing and scanning the directory would have to be locked, > but reading wouldn't (as long as the file hasn't been renamed which > would require scanning to find it). So much for "no locks needed".. > > The problem is that opendir()/readdir() may temporarily not return some > files if there has been changes in the directory since the opendir(). > That means Dovecot thinks a message is expunged, while in fact it > really isn't, and the next scan would usually show it again. > > Currently when that happens, Dovecot usually prints an error message > about it and rebuilds indexes. Of course, in real life clients aren't > often bombing the same mailbox with tons of changes in multiple > connections, which is usually needed to trigger this. > > I wrote a test program which tests this: > > http://dovecot.org/tmp/readdir.cWell, there is a lockless way to do readdir, but it would mean buffering the entire readdir output in memory. while(1) { /* Wait until directory is quiescent */ while (1) { time(&t); /* chmod() invalidates the NFS attribute cache */ fstat(dirhandle, &st); fchmod(dirhandle, st.st_mode); fstat(dirhandle, &st); if (st.st_mtime < t) break; usleep(100000); } t = st.st_mtime; free_list(&files); while (readdir(dirhandle)) { add_list(&files, ...); .. build list in memory .. } fstat(dirhandle, &st); fchmod(dirhandle, &st.st_mode); fstat(dirhandle, &st); } while (st.st_mtime > t); (insert sleep() or usleep() where nessecary). This basically just retries until you've done an entire readdir() on the directory without mtime changes. At least under Linux, chmod invalidates the NFS attribute cache. On normal files, fcntl() locking should do the same, but I'm not sure you can lock directories, and not all servers do NFS locking, so chmod is probably the better choice. Testing would be needed, though. Mike.
Timo Sirainen <tss at iki.fi> writes:> Looks like maildir can't be used very realiably without quite a lot of > locking. Writing and scanning the directory would have to be locked, > but reading wouldn't (as long as the file hasn't been renamed which > would require scanning to find it). So much for "no locks needed".. > > The problem is that opendir()/readdir() may temporarily not return some > files if there has been changes in the directory since the opendir(). > That means Dovecot thinks a message is expunged, while in fact it > really isn't, and the next scan would usually show it again.I'm not sure if the claims are about locking-free scanning (but I believe DJB of Bold Yet Hollow Announcements fame just touted "no locks"); one point is locking-free delivery because if opendir/readdir misses a _new_ file, no harm is done. qmail is so full of bugs I effectively stopped maintaining my qmail-bugs page because I grew tired of researching bugs of a system I stopped using years ago and wackos refuting the bugs <http://home.pages.de/~mandree/qmail-bugs.html>, I only recently found out that qmail-pop3d doesn't get article sizes (in LIST) right. Shame on DJB for claiming efficiency and standards compliance when his nutshell is rather shipwreck, and has been unmaintained for six years... -- Matthias Andree
Timo Sirainen <tss at iki.fi> writes:> I wrote a test program which tests this: > > http://dovecot.org/tmp/readdir.cTest results: File re-appeared messages on SuSE Linux 9.1 x86 (Kernel 2.6.5 patched by SuSE) on these file systems (all local; fast machine, Athlon XP 2500+): tmpfs xfs ext3 reiserfs File re-appeared messages on Solaris 9 x86 (fast machine): swap (/tmp) ufs logging (/var/tmp) nfs (from above Linux server) nfs (from below FreeBSD server) File re-appeared messages on FreeBSD 4.10-RELEASE-p3 x86 (slow machine, K6-2/300): mfs (/tmp) nfs (from above Linux server) nfs (from above Solaris server) ufs softupdates (/var/tmp) So I'd think I have nothing that "works" for your application profile unfortunately. -- Matthias Andree
This is nonsense. The problem is that the behavior of readdir() is confusing. Why should unlink() or rename() invalidate data that your C library ALREADY READ from the directory? This is like saying "I fgetc()'d a byte, but now lseek() shows that my offset is 1024!" - it's silly. Just put a: if (stat(d->d_name, &sb) == -1)continue; After your check for the "." in the first character of the d->d_name (about line 41) and all will be good. No amount of twiddling with USE_UNLINK or FILES is going to affect it. So you say, "I need to stat() each entry? That's going to create a large number of syscalls!" Of course. For readdir() to be atomic, it would need to do a system call for each directory entry. This is exactly why readdir() doesn't, so that you do one syscall for every (say) 50 entries, and if you want validity, you'll do a stat() yourself. Now: Maildir quite obviously wasn't designed with IMAP in mind. IMAP has some (largely ridiculous) requirements that Maildir simply doesn't make easy. The largest problem (with Maildir) is this renaming of file identifiers and moving things in and out of cur/. It's only necessary so programs don't have to open() in order to read flags (after all, they JUST did a readdir())... Since the names aren't going to change in cur/, you can get away with just doing a stat() in there [[ after all, you just rename()'d it into cur/ if you're working on new ]] Unfortunately, cur/ is often bigger than new/. On Sun, 2004-10-24 at 20:02, Timo Sirainen wrote:> Looks like maildir can't be used very realiably without quite a lot of > locking. Writing and scanning the directory would have to be locked, > but reading wouldn't (as long as the file hasn't been renamed which > would require scanning to find it). So much for "no locks needed".. > > The problem is that opendir()/readdir() may temporarily not return some > files if there has been changes in the directory since the opendir(). > That means Dovecot thinks a message is expunged, while in fact it > really isn't, and the next scan would usually show it again. > > Currently when that happens, Dovecot usually prints an error message > about it and rebuilds indexes. Of course, in real life clients aren't > often bombing the same mailbox with tons of changes in multiple > connections, which is usually needed to trigger this. > > I wrote a test program which tests this: > > http://dovecot.org/tmp/readdir.c > > I'd like to hear if you can run it in some system without errors. I > tested Linux 2.4 and 2.6 with ext2, ext3, xfs and reiser3, Solaris > 8/ufs and OpenBSD 3.5/sparc64. Only OpenBSD passed the test, but I'm > not sure if it's only because the computer was so slow and didn't > switch between processes hard enough. I'd be especially interested > about FreeBSD and various NFS systems. > > If it actually works properly in some systems, I guess I'll make the > extra locking configurable.-- Geo Carncross <geocar at internetconnection.net> Internet Connection Reliable Web Hosting http://www.internetconnection.net/