thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status [May 2023]

If this information is useful, please help other people find it:
Share via:

Heming Zhao

2023-May-04 08:02 UTC

[Ocfs2-devel] [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi
wrote:> 
> 
> On 5/4/23 2:21 PM, Heming Zhao wrote:
> > On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 4/30/23 11:13 AM, Heming Zhao wrote:
> >>> fstest generic cases 347 361 628 629 trigger a same issue:
> >>> When jbd2 enter ABORT status, ocfs2 ignores it and keep going
to commit
> >>> journal.
> >>
> >> What's the end user impact?
> > 
> > There are two impacts:
> > 
> > 1. this issue makes unmount command hanging.
> > 
> > No one likes or accepts filesystem unmount failure when underlying
devices meets
> > error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could do
unmount
> > successfully.
> > 
> 
> So umount hang should be in the patch description, right?
OK.
> 
> > 2. for developers to verify their patch doesn't make any
regression.
> > 
> > fstest is a famious & important fs testsuite.
> > (Yes, I know ocfs2 has itself testsuite ocfs2_test.)
> > 
> > Current status, there are many test cases (about 9 in my env) cause
fstest
> > hanging and blocking fstest to run. I did test for gfs2 on latest
tumbleweed,
> > gfs2 only has 1 hanging case.
> > 
> > In my view, ocfs2 developers or maintainers at least make ocfs2 to
finish the
> > testsuite. Long-term target is to make ocfs2 to pass all the
testsuite.
> > 
> > On kernel 6.2.9-1, the fstest 'quick' test group result:
> > (the result based on my two patches [1/2] & [2/2])
> > ```
> > ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648
> > 
> > Failures: generic/003 generic/013 generic/015 generic/040 generic/041
> > generic/062 generic/082 generic/104 generic/107 generic/228
generic/244
> > generic/266 generic/272 generic/277 generic/281 generic/322
generic/329
> > generic/331 generic/336 generic/343 generic/376 generic/379
generic/380
> > generic/383 generic/384 generic/385 generic/386 generic/400
generic/410
> > generic/424 generic/441 generic/448 generic/449 generic/471
generic/479
> > generic/493 generic/495 generic/510 generic/537 generic/552
generic/563
> > generic/578 generic/594 generic/607 generic/620 generic/628
generic/630
> > generic/636 generic/702
> > Failed 49 of 568 tests
> > ```
> > 
> >>
> >> JBD2_ABORT indicates a fatal error happens, either in jounral
layer or
> >> filesystem. And we should not commit any further transactions.
> >> It seems that we may unify the behavior like:
> >>
> >> if (is_journal_aborted(journal))
> >> 	return -EROFS;
> >>
> >>
> > IIUC, do you mean add above code in ocfs2_commit_cache or in
> > ocfs2_commit_thread?
> 
> Yes, break the loop in ocfs2_commit_thread() in case of journal abort.
> Actually we've handled this case before, but just limit the print. But
> it seems not enough now.
I wrote in my previous mail. Follow your idea, The code should be:

if (is_journal_aborted(journal)) {
	ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this line is
important.
	return -EROFS;
}

Only return -EROFS then break loop in ocfs2_commit_thread() is not enough.
Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then the new
IOs will trigger IO error then trigger RO status. This flow is wrong, we should
mark RO as early as possible when JBD ABORT happens. In my view, the best place
is my [2/2] patch code which in ocfs2_commit_cache().
> BTW, the basic rule here is, we don't want to change journal to prevent
> other nodes corrupting the filesystem.
If my memory is correct, every node has itself special journal partition.
If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed machine, 
No other node could corrupting the fs.
> 
> > 1>
> > If in ocfs2_commit_cache, this fix doesn't take any effect for
unmount
> > failure(hanging).
> > When return -EROFS, the caller ocfs2_commit_thread just print error
message and
> > keep going to retry.
> > 2>
> > If in ocfs2_commit_thread, it makes commit thread exit/stop without
notifying
> > other ocfs2 paths. Later wake_up(&osb->checkpoint_event)
won't take effect,
> > ocfs2 will enter chaose. 
> > 
> > In my view, there is missing a place to set fs RO. So [2/2] in
> > ocfs2_commit_cache, uses ocfs2_error to mark RO before returning
status (<0
> > value). The ocfs2_error is the right place to mark RO.
> > 
> > - Heming
> > 
> >>>
> >>> This commit gives ocfs2 ability to handle jbd2 ABORT case.
> >>>
> >>> Signed-off-by: Heming Zhao <heming.zhao at suse.com>
> >>> ---
> >>>  fs/ocfs2/alloc.c      | 10 ++++++----
> >>>  fs/ocfs2/journal.c    |  5 +++++
> >>>  fs/ocfs2/localalloc.c |  3 +++
> >>>  3 files changed, 14 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> >>> index 51c93929a146..d90961a1c433 100644
> >>> --- a/fs/ocfs2/alloc.c
> >>> +++ b/fs/ocfs2/alloc.c
> >>> @@ -6308,11 +6308,13 @@ void
ocfs2_truncate_log_shutdown(struct ocfs2_super *osb)
> >>>  
> >>>  	if (tl_inode) {
> >>>  		cancel_delayed_work(&osb->osb_truncate_log_wq);
> >>> -		flush_workqueue(osb->ocfs2_wq);
> >>> +		if (!is_journal_aborted(osb->journal->j_journal)) {
> >>> +			flush_workqueue(osb->ocfs2_wq);
> >>>  
> >>> -		status = ocfs2_flush_truncate_log(osb);
> >>> -		if (status < 0)
> >>> -			mlog_errno(status);
> >>> +			status = ocfs2_flush_truncate_log(osb);
> >>> +			if (status < 0)
> >>> +				mlog_errno(status);
> >>> +		}
> >>>  
> >>>  		brelse(osb->osb_tl_bh);
> >>>  		iput(osb->osb_tl_inode);
> >>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> >>> index 25d8072ccfce..2798067a2f66 100644
> >>> --- a/fs/ocfs2/journal.c
> >>> +++ b/fs/ocfs2/journal.c
> >>> @@ -312,11 +312,16 @@ static int ocfs2_commit_cache(struct
ocfs2_super *osb)
> >>>  	status = jbd2_journal_flush(journal->j_journal, 0);
> >>>  	jbd2_journal_unlock_updates(journal->j_journal);
> >>>  	if (status < 0) {
> >>> +		if (is_journal_aborted(journal->j_journal)) {
> >>> +			ocfs2_error(osb->sb, "jbd2 status:
ABORT.\n");
> >>> +			goto reset;
> >>> +		}
> >>>  		up_write(&journal->j_trans_barrier);
> >>>  		mlog_errno(status);
> >>>  		goto finally;
> >>>  	}
> >>>  
> >>> +reset:
> >>>  	ocfs2_inc_trans_id(journal);
> >>>  
> >>>  	flushed = atomic_read(&journal->j_num_trans);
> >>> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
> >>> index c4426d12a2ad..e2e3400717b0 100644
> >>> --- a/fs/ocfs2/localalloc.c
> >>> +++ b/fs/ocfs2/localalloc.c
> >>> @@ -378,6 +378,9 @@ void ocfs2_shutdown_local_alloc(struct
ocfs2_super *osb)
> >>>  	if (osb->ocfs2_wq)
> >>>  		flush_workqueue(osb->ocfs2_wq);
> >>>  
> >>> +	if (is_journal_aborted(osb->journal->j_journal))
> >>> +		goto out;
> >>> +
> >>>  	if (osb->local_alloc_state == OCFS2_LA_UNUSED)
> >>>  		goto out;
> >>>

Joseph Qi

2023-May-04 09:41 UTC

head link

[Ocfs2-devel] [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

On 5/4/23 4:02 PM, Heming Zhao wrote:> On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi wrote:
>>
>>
>> On 5/4/23 2:21 PM, Heming Zhao wrote:
>>> On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
>>>>
>>>>
>>>> On 4/30/23 11:13 AM, Heming Zhao wrote:
>>>>> fstest generic cases 347 361 628 629 trigger a same issue:
>>>>> When jbd2 enter ABORT status, ocfs2 ignores it and keep
going to commit
>>>>> journal.
>>>>
>>>> What's the end user impact?
>>>
>>> There are two impacts:
>>>
>>> 1. this issue makes unmount command hanging.
>>>
>>> No one likes or accepts filesystem unmount failure when underlying
devices meets
>>> error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could
do unmount
>>> successfully.
>>>
>>
>> So umount hang should be in the patch description, right?
> 
> OK.
> 
>>
>>> 2. for developers to verify their patch doesn't make any
regression.
>>>
>>> fstest is a famious & important fs testsuite.
>>> (Yes, I know ocfs2 has itself testsuite ocfs2_test.)
>>>
>>> Current status, there are many test cases (about 9 in my env) cause
fstest
>>> hanging and blocking fstest to run. I did test for gfs2 on latest
tumbleweed,
>>> gfs2 only has 1 hanging case.
>>>
>>> In my view, ocfs2 developers or maintainers at least make ocfs2 to
finish the
>>> testsuite. Long-term target is to make ocfs2 to pass all the
testsuite.
>>>
>>> On kernel 6.2.9-1, the fstest 'quick' test group result:
>>> (the result based on my two patches [1/2] & [2/2])
>>> ```
>>> ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648
>>>
>>> Failures: generic/003 generic/013 generic/015 generic/040
generic/041
>>> generic/062 generic/082 generic/104 generic/107 generic/228
generic/244
>>> generic/266 generic/272 generic/277 generic/281 generic/322
generic/329
>>> generic/331 generic/336 generic/343 generic/376 generic/379
generic/380
>>> generic/383 generic/384 generic/385 generic/386 generic/400
generic/410
>>> generic/424 generic/441 generic/448 generic/449 generic/471
generic/479
>>> generic/493 generic/495 generic/510 generic/537 generic/552
generic/563
>>> generic/578 generic/594 generic/607 generic/620 generic/628
generic/630
>>> generic/636 generic/702
>>> Failed 49 of 568 tests
>>> ```
>>>
>>>>
>>>> JBD2_ABORT indicates a fatal error happens, either in jounral
layer or
>>>> filesystem. And we should not commit any further transactions.
>>>> It seems that we may unify the behavior like:
>>>>
>>>> if (is_journal_aborted(journal))
>>>> 	return -EROFS;
>>>>
>>>>
>>> IIUC, do you mean add above code in ocfs2_commit_cache or in
>>> ocfs2_commit_thread?
>>
>> Yes, break the loop in ocfs2_commit_thread() in case of journal abort.
>> Actually we've handled this case before, but just limit the print.
But
>> it seems not enough now.
> 
> I wrote in my previous mail. Follow your idea, The code should be:
> 
> if (is_journal_aborted(journal)) {
> 	ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this line is
important.
> 	return -EROFS;
> }
> 
> Only return -EROFS then break loop in ocfs2_commit_thread() is not enough.
> Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then the
new
> IOs will trigger IO error then trigger RO status. This flow is wrong, we
should
> mark RO as early as possible when JBD ABORT happens. In my view, the best
place
> is my [2/2] patch code which in ocfs2_commit_cache().
> 
Agree, but ocfs2_abort() is more appropriate here, see
ocfs2_start_trans(). But ocfs2_abort() may panic system, I'm afraid it
has to change to read-only accordingly.
>> BTW, the basic rule here is, we don't want to change journal to
prevent
>> other nodes corrupting the filesystem.
> 
> If my memory is correct, every node has itself special journal partition.
> If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed
machine,
> No other node could corrupting the fs.
> I don't think it's a good idea to modify journal even flush fails. Why
not catch EROFS in commit thread and break? Now we can only expect
umount continues with error, right?

Ocfs2 devel - May 2023 - [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

[Ocfs2-devel] [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

[Ocfs2-devel] [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status