thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status [May 2023]

If this information is useful, please help other people find it:
Share via:

Joseph Qi

2023-May-04 09:41 UTC

[Ocfs2-devel] [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

On 5/4/23 4:02 PM, Heming Zhao wrote:> On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi wrote:
>>
>>
>> On 5/4/23 2:21 PM, Heming Zhao wrote:
>>> On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
>>>>
>>>>
>>>> On 4/30/23 11:13 AM, Heming Zhao wrote:
>>>>> fstest generic cases 347 361 628 629 trigger a same issue:
>>>>> When jbd2 enter ABORT status, ocfs2 ignores it and keep
going to commit
>>>>> journal.
>>>>
>>>> What's the end user impact?
>>>
>>> There are two impacts:
>>>
>>> 1. this issue makes unmount command hanging.
>>>
>>> No one likes or accepts filesystem unmount failure when underlying
devices meets
>>> error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could
do unmount
>>> successfully.
>>>
>>
>> So umount hang should be in the patch description, right?
> 
> OK.
> 
>>
>>> 2. for developers to verify their patch doesn't make any
regression.
>>>
>>> fstest is a famious & important fs testsuite.
>>> (Yes, I know ocfs2 has itself testsuite ocfs2_test.)
>>>
>>> Current status, there are many test cases (about 9 in my env) cause
fstest
>>> hanging and blocking fstest to run. I did test for gfs2 on latest
tumbleweed,
>>> gfs2 only has 1 hanging case.
>>>
>>> In my view, ocfs2 developers or maintainers at least make ocfs2 to
finish the
>>> testsuite. Long-term target is to make ocfs2 to pass all the
testsuite.
>>>
>>> On kernel 6.2.9-1, the fstest 'quick' test group result:
>>> (the result based on my two patches [1/2] & [2/2])
>>> ```
>>> ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648
>>>
>>> Failures: generic/003 generic/013 generic/015 generic/040
generic/041
>>> generic/062 generic/082 generic/104 generic/107 generic/228
generic/244
>>> generic/266 generic/272 generic/277 generic/281 generic/322
generic/329
>>> generic/331 generic/336 generic/343 generic/376 generic/379
generic/380
>>> generic/383 generic/384 generic/385 generic/386 generic/400
generic/410
>>> generic/424 generic/441 generic/448 generic/449 generic/471
generic/479
>>> generic/493 generic/495 generic/510 generic/537 generic/552
generic/563
>>> generic/578 generic/594 generic/607 generic/620 generic/628
generic/630
>>> generic/636 generic/702
>>> Failed 49 of 568 tests
>>> ```
>>>
>>>>
>>>> JBD2_ABORT indicates a fatal error happens, either in jounral
layer or
>>>> filesystem. And we should not commit any further transactions.
>>>> It seems that we may unify the behavior like:
>>>>
>>>> if (is_journal_aborted(journal))
>>>> 	return -EROFS;
>>>>
>>>>
>>> IIUC, do you mean add above code in ocfs2_commit_cache or in
>>> ocfs2_commit_thread?
>>
>> Yes, break the loop in ocfs2_commit_thread() in case of journal abort.
>> Actually we've handled this case before, but just limit the print.
But
>> it seems not enough now.
> 
> I wrote in my previous mail. Follow your idea, The code should be:
> 
> if (is_journal_aborted(journal)) {
> 	ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this line is
important.
> 	return -EROFS;
> }
> 
> Only return -EROFS then break loop in ocfs2_commit_thread() is not enough.
> Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then the
new
> IOs will trigger IO error then trigger RO status. This flow is wrong, we
should
> mark RO as early as possible when JBD ABORT happens. In my view, the best
place
> is my [2/2] patch code which in ocfs2_commit_cache().
> 
Agree, but ocfs2_abort() is more appropriate here, see
ocfs2_start_trans(). But ocfs2_abort() may panic system, I'm afraid it
has to change to read-only accordingly.
>> BTW, the basic rule here is, we don't want to change journal to
prevent
>> other nodes corrupting the filesystem.
> 
> If my memory is correct, every node has itself special journal partition.
> If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed
machine,
> No other node could corrupting the fs.
> I don't think it's a good idea to modify journal even flush fails. Why
not catch EROFS in commit thread and break? Now we can only expect
umount continues with error, right?

Heming Zhao

2023-May-04 16:20 UTC

head link

[Ocfs2-devel] [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

On Thu, May 04, 2023 at 05:41:29PM +0800, Joseph Qi
wrote:> 
> 
> On 5/4/23 4:02 PM, Heming Zhao wrote:
> > On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 5/4/23 2:21 PM, Heming Zhao wrote:
> >>> On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote:
> >>>>
> >>>>
> >>>> On 4/30/23 11:13 AM, Heming Zhao wrote:
> >>>>> fstest generic cases 347 361 628 629 trigger a same
issue:
> >>>>> When jbd2 enter ABORT status, ocfs2 ignores it and
keep going to commit
> >>>>> journal.
> >>>>
> >>>> What's the end user impact?
> >>>
> >>> There are two impacts:
> >>>
> >>> 1. this issue makes unmount command hanging.
> >>>
> >>> No one likes or accepts filesystem unmount failure when
underlying devices meets
> >>> error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..)
could do unmount
> >>> successfully.
> >>>
> >>
> >> So umount hang should be in the patch description, right?
> > 
> > OK.
> > 
> >>
> >>> 2. for developers to verify their patch doesn't make any
regression.
> >>>
> >>> fstest is a famious & important fs testsuite.
> >>> (Yes, I know ocfs2 has itself testsuite ocfs2_test.)
> >>>
> >>> Current status, there are many test cases (about 9 in my env)
cause fstest
> >>> hanging and blocking fstest to run. I did test for gfs2 on
latest tumbleweed,
> >>> gfs2 only has 1 hanging case.
> >>>
> >>> In my view, ocfs2 developers or maintainers at least make
ocfs2 to finish the
> >>> testsuite. Long-term target is to make ocfs2 to pass all the
testsuite.
> >>>
> >>> On kernel 6.2.9-1, the fstest 'quick' test group
result:
> >>> (the result based on my two patches [1/2] & [2/2])
> >>> ```
> >>> ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648
> >>>
> >>> Failures: generic/003 generic/013 generic/015 generic/040
generic/041
> >>> generic/062 generic/082 generic/104 generic/107 generic/228
generic/244
> >>> generic/266 generic/272 generic/277 generic/281 generic/322
generic/329
> >>> generic/331 generic/336 generic/343 generic/376 generic/379
generic/380
> >>> generic/383 generic/384 generic/385 generic/386 generic/400
generic/410
> >>> generic/424 generic/441 generic/448 generic/449 generic/471
generic/479
> >>> generic/493 generic/495 generic/510 generic/537 generic/552
generic/563
> >>> generic/578 generic/594 generic/607 generic/620 generic/628
generic/630
> >>> generic/636 generic/702
> >>> Failed 49 of 568 tests
> >>> ```
> >>>
> >>>>
> >>>> JBD2_ABORT indicates a fatal error happens, either in
jounral layer or
> >>>> filesystem. And we should not commit any further
transactions.
> >>>> It seems that we may unify the behavior like:
> >>>>
> >>>> if (is_journal_aborted(journal))
> >>>> 	return -EROFS;
> >>>>
> >>>>
> >>> IIUC, do you mean add above code in ocfs2_commit_cache or in
> >>> ocfs2_commit_thread?
> >>
> >> Yes, break the loop in ocfs2_commit_thread() in case of journal
abort.
> >> Actually we've handled this case before, but just limit the
print. But
> >> it seems not enough now.
> > 
> > I wrote in my previous mail. Follow your idea, The code should be:
> > 
> > if (is_journal_aborted(journal)) {
> > 	ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this
line is important.
> > 	return -EROFS;
> > }
> > 
> > Only return -EROFS then break loop in ocfs2_commit_thread() is not
enough.
> > Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then
the new
> > IOs will trigger IO error then trigger RO status. This flow is wrong,
we should
> > mark RO as early as possible when JBD ABORT happens. In my view, the
best place
> > is my [2/2] patch code which in ocfs2_commit_cache().
> > 
> 
> Agree, but ocfs2_abort() is more appropriate here, see
> ocfs2_start_trans(). But ocfs2_abort() may panic system, I'm afraid it
> has to change to read-only accordingly.
You are right. I ever used ocfs2_abort() then got panic. :)
So I changed to ocfs2_error(). I also sent two mails in 2023-4-30 (the first
mail has mess format by thunderbird). I copy the mail content here:
```
More info for this patch, maybe I should write below info in commit log.
>From the comment of __ocfs2_abort(), there is another way to handlejournal ABORT: panic.
And fstest generic test NO. 648 will trigger ocfs2_abort then make
kernel panic. I don't like the panic style, ocfs2 should elegantly
handle journal abnormal case.
```

fstest NO. 648 could trigger ocfs2_abort from ocfs2_start_trans(). So you could
find my previous mail, I ran the fstest with parameter '-e generic/648',
which
means fstest excludes NO. 648.

The only different between ocfs2_abort and ocfs2_error is set
OCFS2_MOUNT_ERRORS_PANIC, which could trigger panic. If you think panic is
brute, I suggest to use ocfs2_error.
> 
> >> BTW, the basic rule here is, we don't want to change journal
to prevent
> >> other nodes corrupting the filesystem.
> > 
> > If my memory is correct, every node has itself special journal
partition.
> > If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed
machine,
> > No other node could corrupting the fs.
> > 
> I don't think it's a good idea to modify journal even flush fails.
Why
> not catch EROFS in commit thread and break? Now we can only expect
> umount continues with error, right?
> 
With your idea, I spent more than two hours for coding & testing. The result
is
my patch [2/2] is correct. Please believe my patch [2/2].

In patch [2/2], there is 'goto reset' which will resolve below three
issues:

ocfs2_commit_cache
 + ... ...
reset:  //goto label
 + ocfs2_inc_trans_id(journal); //<1.1>
 + atomic_set(&journal->j_num_trans, 0); //<2>
 + ocfs2_wake_downconvert_thread(osb); //<3>
 + wake_up(&journal->j_checkpointed); //<1.2>
 
1> trigger hanging

umount
 ... ...
  ocfs2_clear_inode
   ocfs2_checkpoint_inode(inode)
    + ocfs2_ci_fully_checkpointed() //1.1: check '->j_trans_id'
    + wait_event(osb->journal->j_checkpointed, ...)
       + 1.2: will wait forever, if commit thread exits.

2> trigger BUG_ON()

ocfs2_journal_shutdown
 + BUG_ON(atomic_read(&(osb->journal->j_num_trans)) != 0);

3> trigger dlm lock convert blocking

if there is pending downconvert lock, no one kick it.

At last, you could set up a fstest env and verify my writing.

- Heming

Seemingly Similar Threads

Search for more possibly parallel threads

Ocfs2 devel - May 2023 - [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

[Ocfs2-devel] [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

[Ocfs2-devel] [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

Seemingly Similar Threads