Volker Dose
2005-Dec-15 14:09 UTC
Strange delay in syscall close() on a large ext3-filesystem
Dear Mailinglist, this morning I had a very strange problem with my SuSE Groupware-Server SLOX 4.1. The webmail-access was very slow and even with thunderbird the access to my mailboxes was not okay. I rebooted the whole system, but the problem still was there. I thought, this may be a problem with cyrusd, but I realized, that the error was somewhere deeper in the system, maybe in the filesystem. The cyrusd uses a directory /var/spool/imap/user to save the emails in seperate files (on each mail). When I was trying to open a mail (../user/<MYNAME>/723. ) with the "cat"-command I saw a delay at the end of displaying the file. When I used the "strace"-command to see, which functions are used, I saw, that the one last systemcall was delayed: # /root/strace cat 212966. [... the mailtext is displayed in a usual speed....] [...] ) = 3021 read(3, "", 4096) = 0 close(3 AND NOW there is a delay. After a second or so the close is finished and the screen lookes like this: ) = 3021 read(3, "", 4096) = 0 close(3) = 0 _exit(0) = ? The last line and the half of the pre-lats now is shown. What me really wonders, is that there is only a delay - no errors are in any logfile (accpet in webmail.log, "broken pipe"). The system has the following partitions: slox:/var/spool/imap/alarm-virus # mount /dev/sda6 on / type ext3 (rw) proc on /proc type proc (rw) devpts on /dev/pts type devpts (rw,mode=0620,gid=5) /dev/sdb1 on /var type ext3 (rw,noatime,data=writeback) shmfs on /dev/shm type shm (rw) slox:/var/spool/imap/alarm-virus # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda6 30811104 16479040 12766936 57% / /dev/sdb1 282774360 122292576 146117668 46% /var shmfs 1940876 0 1940876 0% /dev/shm I took a look with tune2fs, but this seems to be okay, right? --- snip --- slox:~ # tune2fs -l /dev/sdb1 tune2fs 1.28 (31-Aug-2002) Filesystem volume name: <none> Last mounted on: <not available> Filesystem UUID: 1f328532-cb23-4c38-bf54-41725fbf89cf Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal filetype needs_recovery sparse_super Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 35913728 Block count: 71820582 Reserved block count: 3591029 Free blocks: 40854579 Free inodes: 35259009 First block: 0 Block size: 4096 Fragment size: 4096 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16384 Inode blocks per group: 512 Last mount time: Thu Dec 8 19:39:48 2005 Last write time: Thu Dec 8 19:39:48 2005 Mount count: 1 Maximum mount count: 38 Last checked: Thu Dec 8 18:34:47 2005 Check interval: 15552000 (6 months) Next check after: Tue Jun 6 19:34:47 2006 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal UUID: <none> Journal inode: 8 Journal device: 0x0000 First orphan inode: 18677862 ---- snip --- On that particular filesystem ar a lot of mostly small files: --- snip --- slox:/var # find . -type f|wc 576393 669590 18992660 --- snip --- I did the following workaround to solve the problem for now: moving the "user"-dir to /var and make a symlink to /var/spool/imap - this did work. Now (in my opinion) I am using another "way" (inode) to access the files and the speed increased to usual behavior. Is it possible, the the inode-chain below /var/spool is damaged in a way? It seems to me, that now, after the move, there must be another inode-chain when accessing the mail-files. I did a fsck a few days later while a reboot but there was nothing remarkable at all. It is maybe important, that /dev/sdb1 is a RAID5 on a Dell-Server 2850, dual XEON 3Ghz. Any help would be appreciated. With kind regards, Volker Dose -- Die eCONNEX AG bietet wirksame IT-L?sungen. Weitere Informationen finden Sie unter www.econnex.de. ____________________________________________ eCONNEX AG Volker Dose (Netzwerkadministration) D?nische Stra?e 15 - 24103 Kiel Tel 0431 59369 0 - Fax 0431 59369 19 Valentinskamp 24 - 20354 Hamburg Tel 040 31112 903 - Fax 040 31112 200 _____________________________________________________
evilninja
2005-Dec-15 18:59 UTC
Strange delay in syscall close() on a large ext3-filesystem
Volker Dose schrieb:> > ) = 3021 > read(3, "", 4096) = 0 > close(3) = 0 > _exit(0) = ?hm, when i do "strace cat <FILE>" there's a "close(1)" added: % strace cat file.txt [...] ) = 305 read(3, "", 4096) = 0 close(3) = 0 close(1) = 0 exit_group(0) = ? not sure why, but strace shows always this close(1), no matter if i do this on ext3/jfs/nfs. which kernel are you using?> slox:~ # tune2fs -l /dev/sdb1 > tune2fs 1.28 (31-Aug-2002)how comes e2fsprogs is *that* old on SLOX 4.1? -- BOFH excuse #145: Flat tire on station wagon with tapes. ("Never underestimate the bandwidth of a station wagon full of tapes hurling down the highway" Andrew S. Tannenbaum)
evilninja at gmx.net
2005-Dec-16 15:13 UTC
Strange delay in syscall close() on a large ext3-filesystem
Volker Dose wrote:> Good Morning, > > If I do that strace cat <FILE> on another machine, I also see a close(1):ok. i don't know if this is important at all, i just noticed it.> > > --- snip --- > write(2, "\n", 1 > ) = 1 > close(1) = 0 > exit_group(1) = ? > --- snip --- > > And on another cyrus-mailserver close(3) and close(1): > > --- snip --- > ) = 1723 > read(3, "", 4096) = 0 > close(3) = 0 > close(1) = 0 > exit_group(0) = ? > --- snip --- > > > What exactly do this numbers mean? Filedescriptor? Isn't 1,2 and 3 for > STDIN,STDOUT and STDERR? > > The tune2fs-binary is just what you get, when youo install an SLOX and do > the usual updates: > > --- snip --- > slox:~ # rpm -qf /sbin/tune2fs > e2fsprogs-1.28-30 > --- snip --- > > On my SLES-9 cyrus-mailserver: > mail:/var/spool/imap/user/volker # rpm -qf /sbin/tune2fs > e2fsprogs-1.36-6.2 > --- snip --- > > > The kernen-version on SLOX is: > --- snip --- > slox:~ # uname -a > Linux slox 2.4.21-295-smp #1 SMP Wed Jun 29 16:45:30 UTC 2005 i686 unknown > --- snip --- > > SLOX is based on United Linux 1, as you can see, a quite old distribution. I > think it ios very close to SLES 8.i really don't know what could've caused the long delays in close(). i'd compile e2fsprogs-1.39-WIP-1210.tar.gz just to be sure no errors on the fs are missed. please respond on-list, so that the e2fs-gurus can comment.... Christian. -- BOFH excuse #25: Decreasing electron flux