thr3ads.net - Ext3 users - EXT3 deadlock in 2.4.22 and 2.4.23-pre7

If this information is useful, please help other people find it:
Share via:

Neil Brown

2003-Oct-27 06:06 UTC

EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?

Hi all, and particularly Andrew and Stephen,

 I recently "upgraded' one of my NFS fileservers from (patched)2.4.18
 to 2.4.23-pre7 (in order to resolve a HIMEM related memory pressure
 problem).

 Unfortunately I have experienced what appears to be a deadlock.
 The one I will describe was experienced while running 2.4.23-pre7,
 though I had a very similar problem in 2.4.22 (but didn't collect
 enough data to be able to give a useful report at the time).

 An ext3 filesystem is mounted with data=journal, with an external
 journal, and accessed mostly via NFS.  v1 quotas are enabled.

 The stacktrace lines below are from a lightly editted 
   "echo t > /proc/sysrq-trigger"
 output.


 A substantial number of nfsd threads a waiting to "start a handle":
    nfsd Call Trace:    [sleep_on+75/124]
         [start_this_handle+205/368] [journal_start+149/196]
         [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
         [update_atime+75/80] [do_generic_file_read+1158/1172]
         [generic_file_read+147/400] [file_read_actor+0/224]
         [nfsd_read+477/636] [nfsd3_proc_read+295/388]
         [nfsd_dispatch+207/406] [svc_process+655/1264] [nfsd+566/944]
         [arch_kernel_thread+40/56] 

This sleep_on is the one at line 136 of
transcation.c(start_this_handle), where it is waiting for t_state to
stop being T_LOCKED. 

The related kjournald is at:
    kjournald Call Trace:    [sleep_on+75/124]
         [journal_commit_transaction+357/4044] [do_IRQ+221/236]
         [.text.lock.sched+131/471] [kjournald+326/540]
         [commit_timeout+0/12] [arch_kernel_thread+40/56] 

This sleep_on is at line 87 in commit.c (journal_commit_transaction)
where it is waiting for t_updates to be 0.  At this point, 
t_state is T_LOCKED, so presumably those nfsd threads above are
waiting on kjournald.  But what is kjournald really waiting for?
My first though was the two nfsd threads  in:
     nfsd Call Trace:    [sleep_on+75/124]
	  [log_wait_commit+74/136] [journal_stop+408/432]
	  [journal_force_commit+78/128] [ext3_force_commit+66/112]
	  [ext3_sync_file+128/144] [nfsd_sync_dir+49/72]
	  [nfsd_unlink+455/480] [nfsd_proc_remove+122/140]
	  [nfsd_dispatch+207/406] [svc_process+655/1264]
	  [nfsd+566/944] [arch_kernel_thread+40/56] 

that are waiting on j_wait_done_commit.  However they are doing that
from journal_stop *after* journal_stop has decremented t_updates, so
it doesn't seem likely that kjournald is waiting on that.

Outside of nfsd, there is an rquotad program (locally written, not the
standard one) that is :

      rquotad Call Trace:    [sleep_on+75/124]
            [start_this_handle+205/368] [journal_start+149/196]
            [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
            [update_atime+75/80] [do_generic_file_read+1158/1172]
            [generic_file_read+147/400] [file_read_actor+0/224]
            [do_get_write_access+1382/1420] [v1_read_dqblk+121/196]
            [read_dqblk+76/128] [dqget+344/484] [vfs_get_dqblk+21/64]
            [v1_get_dqblk+39/172] [link_path_walk+2680/2956]
            [do_compat_quotactl+417/688] [resolve_dev+89/108]
            [sys_quotactl+166/275] [system_call+51/56] 

So it is trying to start a transaction to update the atime on the
quota file, and has a lock on some quota structures thanks to
"read_dqblk".

At the same time, "sync" is running:

         sync Call Trace:    [__down+109/208] [__down_failed+8/12]
             [.text.lock.dquot+73/286] [ext3_sync_dquot+337/462]
             [vfs_quota_sync+102/372] [sync_dquots_dev+194/260]
             [fsync_dev+66/128] [sys_sync+7/16] [system_call+51/56] 

and has started an ext3 transaction (in ext3_sync_dquot) and is trying
to get the lock that rquotad has.

Presumably the transaction that sync has started is keeping t_updates
greater than 0, thus preventing kjournald from progressing, and this
preventing anyone else, including rquotad, from starting a new
transaction.  Hence a deadlock.

The incomplete trace that I have from the first time this problem
happened also has sync blocks in the same place, and an nfsd thread,
rather than rquotad, holding a lock in read_dqblk.

I note that ext3_sync_dquot didn't exist in 2.4.18 and does now, so
that correlated well with the observation that this problem has been
introduced since 2.4.18.

If my analysis is correct, I can largely avoid the problem by not
running "sync" as much (I currently run it every 5 minutes which
isn't
needed any longer (it's a long story:-)) but that doesn't fix the bug.

My guess is that ext3_sync_dquot doesn't need  ext3_journal_start at
all but that isn't a well-informed guess.

Help appreciated.

NeilBrown

Christopher Li

2003-Oct-27 06:20 UTC

head link

Re: EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?

On Mon, Oct 27, 2003 at 05:06:16PM +1100, Neil Brown
wrote:> 
> The related kjournald is at:
>     kjournald Call Trace:    [sleep_on+75/124]
>          [journal_commit_transaction+357/4044] [do_IRQ+221/236]
>          [.text.lock.sched+131/471] [kjournald+326/540]
>          [commit_timeout+0/12] [arch_kernel_thread+40/56] 
> 
> This sleep_on is at line 87 in commit.c (journal_commit_transaction)
> where it is waiting for t_updates to be 0.  At this point, 
> t_state is T_LOCKED, so presumably those nfsd threads above are
> waiting on kjournald.  But what is kjournald really waiting for?
kjournald is wait for the current pending transaction to stop.
> My first though was the two nfsd threads  in:
>      nfsd Call Trace:    [sleep_on+75/124]
> 	  [log_wait_commit+74/136] [journal_stop+408/432]
> 	  [journal_force_commit+78/128] [ext3_force_commit+66/112]
> 	  [ext3_sync_file+128/144] [nfsd_sync_dir+49/72]
> 	  [nfsd_unlink+455/480] [nfsd_proc_remove+122/140]
> 	  [nfsd_dispatch+207/406] [svc_process+655/1264]
> 	  [nfsd+566/944] [arch_kernel_thread+40/56] 
> 
> that are waiting on j_wait_done_commit.  However they are doing that
> from journal_stop *after* journal_stop has decremented t_updates, so
> it doesn't seem likely that kjournald is waiting on that.
That is right.
> 
> Outside of nfsd, there is an rquotad program (locally written, not the
> standard one) that is :
> 
>       rquotad Call Trace:    [sleep_on+75/124]
>             [start_this_handle+205/368] [journal_start+149/196]
>             [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
>             [update_atime+75/80] [do_generic_file_read+1158/1172]
>             [generic_file_read+147/400] [file_read_actor+0/224]
>             [do_get_write_access+1382/1420] [v1_read_dqblk+121/196]
>             [read_dqblk+76/128] [dqget+344/484] [vfs_get_dqblk+21/64]
>             [v1_get_dqblk+39/172] [link_path_walk+2680/2956]
>             [do_compat_quotactl+417/688] [resolve_dev+89/108]
>             [sys_quotactl+166/275] [system_call+51/56] 
> 
> So it is trying to start a transaction to update the atime on the
> quota file, and has a lock on some quota structures thanks to
> "read_dqblk".
This guy is waiting the journal commit to be finished, seems harmless
to me.
> 
> At the same time, "sync" is running:
> 
>          sync Call Trace:    [__down+109/208] [__down_failed+8/12]
>              [.text.lock.dquot+73/286] [ext3_sync_dquot+337/462]
>              [vfs_quota_sync+102/372] [sync_dquots_dev+194/260]
>              [fsync_dev+66/128] [sys_sync+7/16] [system_call+51/56] 
> 
> and has started an ext3 transaction (in ext3_sync_dquot) and is trying
> to get the lock that rquotad has.
That seems wrong to me. It should get the lock before it start the
transasction. For the same reason that you can't lock_page inside
journal transasction, it is a ranking error. BTW, it seems that
current bk tree, truncate still do lock_page inside journal
transasction.
> 
> Presumably the transaction that sync has started is keeping t_updates
> greater than 0, thus preventing kjournald from progressing, and this
> preventing anyone else, including rquotad, from starting a new
> transaction.  Hence a deadlock.
That is right.
> 
> My guess is that ext3_sync_dquot doesn't need  ext3_journal_start at
> all but that isn't a well-informed guess.
I think you want to put ext3_sync_dquot to be atomic on power failure.  
The journal handle can get from ext3_current_journal_handle, which
used by writepage etc.

Chris

Andrew Morton

2003-Oct-27 10:01 UTC

head link

Re: EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?

Neil Brown <neilb@cse.unsw.edu.au> wrote:>
> 
> Hi all, and particularly Andrew and Stephen,
> 
>  I recently "upgraded' one of my NFS fileservers from
(patched)2.4.18
>  to 2.4.23-pre7 (in order to resolve a HIMEM related memory pressure
>  problem).
> 
>  Unfortunately I have experienced what appears to be a deadlock.
>
I don't recall a time when quotas on ext3 were not deadlocky :(
> ...
> 
>       rquotad Call Trace:    [sleep_on+75/124]
>             [start_this_handle+205/368] [journal_start+149/196]
>             [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
>             [update_atime+75/80] [do_generic_file_read+1158/1172]
>             [generic_file_read+147/400] [file_read_actor+0/224]
>             [do_get_write_access+1382/1420] [v1_read_dqblk+121/196]
>             [read_dqblk+76/128] [dqget+344/484] [vfs_get_dqblk+21/64]
>             [v1_get_dqblk+39/172] [link_path_walk+2680/2956]
>             [do_compat_quotactl+417/688] [resolve_dev+89/108]
>             [sys_quotactl+166/275] [system_call+51/56] 
read_dqblk() took dqio_sem, then ext3_dirty_inode() did journal_start().
> At the same time, "sync" is running:
> 
>          sync Call Trace:    [__down+109/208] [__down_failed+8/12]
>              [.text.lock.dquot+73/286] [ext3_sync_dquot+337/462]
>              [vfs_quota_sync+102/372] [sync_dquots_dev+194/260]
>              [fsync_dev+66/128] [sys_sync+7/16] [system_call+51/56] 
ext3_sync_dquot() did journal_start(), then someone (commit_dqblk?) tried
to take dqio_sem.

Probably it is not too hard to flip the ordering in the latter case, but all
other code paths need to be checked, and perhaps even documented.

Jan, didn't we fix this very problem in 2.6 recently?

Possibly Parallel Threads

Search for more reasonably related threads

Ext3 users - Oct 2003 - EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?

EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?

Re: EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?

Re: EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?

Possibly Parallel Threads