thr3ads.net - Ext3 users - ext3 with quota under heavy load. [Jun 2003]

If this information is useful, please help other people find it:
Share via:

Dale

2003-Jun-26 13:46 UTC

ext3 with quota under heavy load.

Hello list,

I have a problem with an NFS server for my network.  It has ran kernels
2.4.18-ac4 - 2.4.21-ac1, all with problems.  The -ac patches are used
to provide the new style quota support.  The system seems to have
gotten even less stable with the new kernel versions.

This morning around 5 am, I got a page the system was unresponding to
NFS requests.  I ssh'd in, and found the loadavg at ~50.  Below are
some snippets from ps at the time:

root      3414  0.8  0.1  3904 3048 ?        DN   04:02   1:45
/usr/bin/updatedb -f NFS,SMBFS,NCPFS,PROC,DEVPTS -e /tmp,/var/tmp,/us
root      3979  0.0  0.0  2588 1192 ?        DN   04:14   0:00
/usr/bin/rsync -aH --delete /home/puser1 /home/puser2 /home/puser3

The rsync command is backing up across the network to a backup nfs
server.  updatedb starts at 4:02 am, and the rsync had been running
since 3:30 and was half-way completed (estimated by the 'p' in the
uername).

Also there were 32 nfsd's just like this:
root  851  0.0  0.0   0    0 ?    DW   Jun19   4:35 [nfsd]

and these, the other 4 kjournald's were in SW.
root   7  0.1  0.0   0    0 ?     DW   Jun19  17:04 [kswapd]
root 144  0.0  0.0   0    0 ?     DW   Jun19   6:53 [kjournald]

I'm wondering what my options are, this has happened ~10 times in the
last 6 months, although the system went a period of ~120 days without a
hiccup.  This last time on 2.4.21-ac1 was only 6 days.
It wouldn't be so bad if a `shutdown -r now` would restart it, but it
hangs while shutting down nfs and during killall and needs hard
rebooted.

Thanks for any insight or solutions.

Dale

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

Andreas Dilger

2003-Jun-26 18:46 UTC

head link

Re: ext3 with quota under heavy load.

On Jun 26, 2003  06:46 -0700, Dale wrote:> I have a problem with an NFS server for my network.  It has ran kernels
> 2.4.18-ac4 - 2.4.21-ac1, all with problems.  The -ac patches are used
> to provide the new style quota support.  The system seems to have
> gotten even less stable with the new kernel versions.
> 
> This morning around 5 am, I got a page the system was unresponding to
> NFS requests.  I ssh'd in, and found the loadavg at ~50.  Below are
> some snippets from ps at the time:
> 
> root      3414  0.8  0.1  3904 3048 ?        DN   04:02   1:45
> /usr/bin/updatedb -f NFS,SMBFS,NCPFS,PROC,DEVPTS -e /tmp,/var/tmp,/us
> root      3979  0.0  0.0  2588 1192 ?        DN   04:14   0:00
> /usr/bin/rsync -aH --delete /home/puser1 /home/puser2 /home/puser3
> 
> The rsync command is backing up across the network to a backup nfs
> server.  updatedb starts at 4:02 am, and the rsync had been running
> since 3:30 and was half-way completed (estimated by the 'p' in the
> uername).
> 
> Also there were 32 nfsd's just like this:
> root  851  0.0  0.0   0    0 ?    DW   Jun19   4:35 [nfsd]
> 
> and these, the other 4 kjournald's were in SW.
> root   7  0.1  0.0   0    0 ?     DW   Jun19  17:04 [kswapd]
> root 144  0.0  0.0   0    0 ?     DW   Jun19   6:53 [kjournald]
> 
> I'm wondering what my options are, this has happened ~10 times in the
> last 6 months, although the system went a period of ~120 days without a
> hiccup.  This last time on 2.4.21-ac1 was only 6 days.
> It wouldn't be so bad if a `shutdown -r now` would restart it, but it
> hangs while shutting down nfs and during killall and needs hard
> rebooted.
This almost certainly is a lock deadlock of some sort.  I've had pretty
good luck in debugging such problems just by running "sysrq-T" on the
console and/or using "crash" to examine the running kernel.  This
needs
a fair amount of knowledge of the various locks in ext3.  The most
common problems are related to lock ordering problems with some process
starting a journal transaction and then blocking on a lock (e.g. directory
or inode semaphore, or superblock lock), and some other process holding
that lock and trying to start a new transaction when the journal is full.

The journal being full is a crucial issue, because if it isn't full you
can start a new transaction without problems, but when it is full you need
to flush the journal and wait for all existing users to free up their handles,
which will never happen if the first process has a transaction handle and is
blocked waiting for a lock the second process is holding.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

Poul Petersen

2003-Jul-14 19:30 UTC

head link

RE: ext3 with quota under heavy load.

> The journal being full is a crucial issue, because if it 
> isn't full you
> can start a new transaction without problems, but when it is 
> full you need
> to flush the journal and wait for all existing users to free 
> up their handles,
> which will never happen if the first process has a 
> transaction handle and is
> blocked waiting for a lock the second process is holding.
> 
> Cheers, Andreas
> --
	Interesting - is there any way to monitor the journal usage to
determine if a given filesystem would benefit from an increased journal
size?

-poul

Apparently Analagous Threads

Search for more maybe matching threads

Ext3 users - Jun 2003 - ext3 with quota under heavy load.

ext3 with quota under heavy load.

Re: ext3 with quota under heavy load.

RE: ext3 with quota under heavy load.

Apparently Analagous Threads