thr3ads.net - Ext3 users - [patch 2.4] Fix ext3 scheduling storm and lockup [Jan 2003]

If this information is useful, please help other people find it:
Share via:

Andrew Morton

2003-Jan-18 01:05 UTC

[patch 2.4] Fix ext3 scheduling storm and lockup

This patch fixes an inefficiency and potential system lockup in the 2.4
kernel's ext3 filesystem.  The problem has been present since 2.4.20-pre5. 
This patch is applicable to 2.4.20.  A copy is at

http://www.zip.com.au/~akpm/linux/patches/2.4/2.4.20/ext3-scheduling-storm.patch

Anyone who is using tasks which have realtime scheduling policy on ext3
systems should apply this change.


Details:

At the start of do_get_write_access() we have this logic:

	repeat:
		lock_buffer(jh->bh);
		...
		unlock_buffer(jh->bh);
		...
		if (jh->j_list == BJ_Shadow) {
			sleep_on_buffer(jh->bh);
			goto repeat;
		}

The problem is that the unlock_buffer() will wake up anyone who is sleeping
in the sleep_on_buffer().

So if task A is asleep in sleep_on_buffer() and task B now runs
do_get_write_access(), task B will wake task A by accident.  Task B will then
sleep on the buffer and task A will loop, will run unlock_buffer() and then
wake task B.

Net effect: the system does 100,000 context switches/sec until I/O completes
against the buffer and kjournald changes the value of jh->j_list.

Unless task A and task B happen to both have realtime scheduling policy - if
they do then kjournald will never run.  The state is never cleared and your
box locks up.


The fix is to not do the `goto repeat;' until the buffer has been taken off
the shadow list.  So we don't go and wake up the other waiter(s) until they
can actually proceed to use the buffer.



diff -puN fs/jbd/transaction.c~ext3-scheduling-storm fs/jbd/transaction.c
--- 24/fs/jbd/transaction.c~ext3-scheduling-storm	2003-01-16 02:45:19.000000000
-0800
+++ 24-akpm/fs/jbd/transaction.c	2003-01-16 02:45:19.000000000 -0800
@@ -669,7 +669,8 @@ repeat:
 			spin_unlock(&journal_datalist_lock);
 			unlock_journal(journal);
 			/* commit wakes up all shadow buffers after IO */
-			sleep_on(&jh2bh(jh)->b_wait);
+			wait_event(jh2bh(jh)->b_wait,
+						jh->b_jlist != BJ_Shadow);
 			lock_journal(journal);
 			goto repeat;
 		}

_

Norman Schmidt

2003-Jan-18 08:49 UTC

head link

Re: [patch 2.4] Fix ext3 scheduling storm and lockup

Hi Andrew!

Andrew Morton schrieb:> This patch fixes an inefficiency and potential system lockup in the 2.4
> kernel's ext3 filesystem.  The problem has been present since
2.4.20-pre5.
> This patch is applicable to 2.4.20.  A copy is at
> 
>
http://www.zip.com.au/~akpm/linux/patches/2.4/2.4.20/ext3-scheduling-storm.patch
Which of the patches available on the above URL should be patched into a 
2.4.20 kernel anyway? All of them? I just did the three mentioned on 
http://www.zipworld.com.au/~akpm/linux/ext3/ ; perhaps you should update 
the doc a little bit?

Thanks, Norman Schmidt.
-- 

-- 
Norman Schmidt             Universitaet Erlangen-Nuernberg            _
cand.chem.                 Sysadmin Wohnheimnetzwerk RatNET        _ //
mailto:schmidt@naa.net                                             \X/

Andrew Morton

2003-Jan-18 09:06 UTC

head link

Re: [patch 2.4] Fix ext3 scheduling storm and lockup

Norman Schmidt <norman.schmidt@ratnet.stw.uni-erlangen.de>
wrote:>
> Hi Andrew!
> 
> Andrew Morton schrieb:
> > This patch fixes an inefficiency and potential system lockup in the
2.4
> > kernel's ext3 filesystem.  The problem has been present since
2.4.20-pre5.
> > This patch is applicable to 2.4.20.  A copy is at
> > 
> >
http://www.zip.com.au/~akpm/linux/patches/2.4/2.4.20/ext3-scheduling-storm.patch
> 
> Which of the patches available on the above URL should be patched into a 
> 2.4.20 kernel anyway? All of them? I just did the three mentioned on 
> http://www.zipworld.com.au/~akpm/linux/ext3/ ; perhaps you should update 
> the doc a little bit?
> 
Five of them :(

I've updated the instructions, thanks.

Seemingly Similar Threads

Search for more seemingly similar threads

Ext3 users - Jan 2003 - [patch 2.4] Fix ext3 scheduling storm and lockup

[patch 2.4] Fix ext3 scheduling storm and lockup

Re: [patch 2.4] Fix ext3 scheduling storm and lockup

Re: [patch 2.4] Fix ext3 scheduling storm and lockup

Seemingly Similar Threads