thr3ads.net - Ext3 users - Strange delay in syscall close() on a large ext3-filesystem [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Volker Dose

2005-Dec-15 14:09 UTC

Strange delay in syscall close() on a large ext3-filesystem

Dear Mailinglist,

this morning I had a very strange problem with my SuSE Groupware-Server SLOX
4.1.

The webmail-access was very slow and even with thunderbird the access to my
mailboxes was not okay.
I rebooted the whole system, but the problem still was there.

I thought, this may be a problem with cyrusd, but I realized, that the error
was somewhere deeper in the system,
maybe in the filesystem.

The cyrusd uses a directory  /var/spool/imap/user to save the emails in
seperate files (on each mail).

When I was trying to open a mail (../user/<MYNAME>/723. ) with the
"cat"-command I saw a delay at the end of displaying the file.
When I used the "strace"-command to see, which functions are used, I
saw,
that the one last systemcall was delayed:


# /root/strace cat 212966.

[... the mailtext is displayed in a usual speed....]
[...]
) = 3021
read(3, "", 4096)                       = 0
close(3

AND NOW there is a delay. After a second or so the close is finished and the
screen lookes like this:


) = 3021
read(3, "", 4096)                       = 0
close(3)                                = 0
_exit(0)                                = ?


The last line and the half of the pre-lats now is shown.

What me really wonders, is that there is only a delay - no errors are in any
logfile (accpet in webmail.log, "broken pipe").

The system has the following partitions:

slox:/var/spool/imap/alarm-virus # mount
/dev/sda6 on / type ext3 (rw)
proc on /proc type proc (rw)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
/dev/sdb1 on /var type ext3 (rw,noatime,data=writeback) shmfs on /dev/shm
type shm (rw)

slox:/var/spool/imap/alarm-virus # df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda6             30811104  16479040  12766936  57% /
/dev/sdb1            282774360 122292576 146117668  46% /var
shmfs                  1940876         0   1940876   0% /dev/shm

I took a look with tune2fs, but this seems to be okay, right?
--- snip ---
slox:~ # tune2fs -l /dev/sdb1
tune2fs 1.28 (31-Aug-2002)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          1f328532-cb23-4c38-bf54-41725fbf89cf
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal filetype needs_recovery sparse_super
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              35913728
Block count:              71820582
Reserved block count:     3591029
Free blocks:              40854579
Free inodes:              35259009
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Last mount time:          Thu Dec  8 19:39:48 2005
Last write time:          Thu Dec  8 19:39:48 2005
Mount count:              1
Maximum mount count:      38
Last checked:             Thu Dec  8 18:34:47 2005
Check interval:           15552000 (6 months)
Next check after:         Tue Jun  6 19:34:47 2006
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal UUID:             <none>
Journal inode:            8
Journal device:           0x0000
First orphan inode:       18677862

---- snip ---

On that particular filesystem ar a lot of mostly small files:

--- snip ---
slox:/var # find . -type f|wc
 576393  669590 18992660
--- snip ---


I did the following workaround to solve the problem for now:

moving the "user"-dir to /var and make a symlink to /var/spool/imap -
this
did work. Now (in my opinion) I am using another "way" (inode)  to
access
the files and the speed increased to usual behavior.

Is it possible, the the inode-chain below /var/spool is damaged in a way?
It seems to me, that now, after the move, there must be another inode-chain
when accessing the mail-files.

I did a fsck a few days later while a reboot but there was nothing
remarkable at all.

It is maybe  important, that /dev/sdb1 is a RAID5 on a Dell-Server 2850,
dual XEON 3Ghz.

Any help would be appreciated.

With kind regards,
Volker Dose

 --

Die eCONNEX AG bietet wirksame IT-L?sungen.
Weitere Informationen finden Sie unter www.econnex.de.
____________________________________________

eCONNEX AG
Volker Dose (Netzwerkadministration)

D?nische Stra?e 15 - 24103 Kiel
Tel 0431 59369 0 - Fax 0431 59369 19

Valentinskamp 24 - 20354 Hamburg
Tel 040 31112 903 - Fax 040 31112 200
_____________________________________________________

evilninja

2005-Dec-15 18:59 UTC

head link

Strange delay in syscall close() on a large ext3-filesystem

Volker Dose schrieb:> 
> ) = 3021
> read(3, "", 4096)                       = 0
> close(3)                                = 0
> _exit(0)                                = ?
hm, when i do "strace cat <FILE>" there's a
"close(1)" added:

% strace cat file.txt
[...]
) = 305
read(3, "", 4096)                       = 0
close(3)                                = 0
close(1)                                = 0
exit_group(0)                           = ?

not sure why, but strace shows always this close(1), no matter if i do
this on ext3/jfs/nfs. which kernel are you using?
> slox:~ # tune2fs -l /dev/sdb1
> tune2fs 1.28 (31-Aug-2002)
how comes e2fsprogs is *that* old on SLOX 4.1?
-- 
BOFH excuse #145:

Flat tire on station wagon with tapes.  ("Never underestimate the
bandwidth of a station wagon full of tapes hurling down the highway"
Andrew S. Tannenbaum)

evilninja at gmx.net

2005-Dec-16 15:13 UTC

head link

Strange delay in syscall close() on a large ext3-filesystem

Volker Dose wrote:> Good Morning,
> 
> If I do that strace cat <FILE> on another machine, I also see a
close(1):
ok. i don't know if this is important at all, i just noticed it.
> 
> 
> --- snip ---
> write(2, "\n", 1
> )                       = 1
> close(1)                                = 0
> exit_group(1)                           = ?
> --- snip ---
> 
> And on another cyrus-mailserver close(3) and close(1):
> 
> --- snip ---
> ) = 1723
> read(3, "", 4096)                       = 0
> close(3)                                = 0
> close(1)                                = 0
> exit_group(0)                           = ?
> --- snip ---
> 
> 
> What exactly do this numbers mean? Filedescriptor? Isn't 1,2 and 3 for
> STDIN,STDOUT and STDERR?
> 
> The tune2fs-binary is just what you get, when youo install an SLOX and do
> the usual updates:
> 
> --- snip ---
> slox:~ # rpm -qf /sbin/tune2fs
> e2fsprogs-1.28-30
> --- snip ---
> 
> On my SLES-9 cyrus-mailserver:
> mail:/var/spool/imap/user/volker # rpm -qf /sbin/tune2fs
> e2fsprogs-1.36-6.2
> --- snip ---
> 
> 
> The kernen-version on SLOX is:
> --- snip ---
> slox:~ # uname -a
> Linux slox 2.4.21-295-smp #1 SMP Wed Jun 29 16:45:30 UTC 2005 i686 unknown
> --- snip ---
> 
> SLOX is based on United Linux 1, as you can see, a quite old distribution.
I
> think it ios very close to SLES 8.
i really don't know what could've caused the long delays in close().
i'd
compile e2fsprogs-1.39-WIP-1210.tar.gz just to be sure no errors on the
fs are missed.

please respond on-list, so that the e2fs-gurus can comment....

Christian.
-- 
BOFH excuse #25:

Decreasing electron flux

Seemingly Similar Threads

Search for more possibly parallel threads

Ext3 users - Dec 2005 - Strange delay in syscall close() on a large ext3-filesystem

Strange delay in syscall close() on a large ext3-filesystem

Strange delay in syscall close() on a large ext3-filesystem

Strange delay in syscall close() on a large ext3-filesystem

Seemingly Similar Threads