Joseph Qi
2023-May-04 09:41 UTC
[Ocfs2-devel] [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status
On 5/4/23 4:02 PM, Heming Zhao wrote:> On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi wrote: >> >> >> On 5/4/23 2:21 PM, Heming Zhao wrote: >>> On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote: >>>> >>>> >>>> On 4/30/23 11:13 AM, Heming Zhao wrote: >>>>> fstest generic cases 347 361 628 629 trigger a same issue: >>>>> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit >>>>> journal. >>>> >>>> What's the end user impact? >>> >>> There are two impacts: >>> >>> 1. this issue makes unmount command hanging. >>> >>> No one likes or accepts filesystem unmount failure when underlying devices meets >>> error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could do unmount >>> successfully. >>> >> >> So umount hang should be in the patch description, right? > > OK. > >> >>> 2. for developers to verify their patch doesn't make any regression. >>> >>> fstest is a famious & important fs testsuite. >>> (Yes, I know ocfs2 has itself testsuite ocfs2_test.) >>> >>> Current status, there are many test cases (about 9 in my env) cause fstest >>> hanging and blocking fstest to run. I did test for gfs2 on latest tumbleweed, >>> gfs2 only has 1 hanging case. >>> >>> In my view, ocfs2 developers or maintainers at least make ocfs2 to finish the >>> testsuite. Long-term target is to make ocfs2 to pass all the testsuite. >>> >>> On kernel 6.2.9-1, the fstest 'quick' test group result: >>> (the result based on my two patches [1/2] & [2/2]) >>> ``` >>> ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648 >>> >>> Failures: generic/003 generic/013 generic/015 generic/040 generic/041 >>> generic/062 generic/082 generic/104 generic/107 generic/228 generic/244 >>> generic/266 generic/272 generic/277 generic/281 generic/322 generic/329 >>> generic/331 generic/336 generic/343 generic/376 generic/379 generic/380 >>> generic/383 generic/384 generic/385 generic/386 generic/400 generic/410 >>> generic/424 generic/441 generic/448 generic/449 generic/471 generic/479 >>> generic/493 generic/495 generic/510 generic/537 generic/552 generic/563 >>> generic/578 generic/594 generic/607 generic/620 generic/628 generic/630 >>> generic/636 generic/702 >>> Failed 49 of 568 tests >>> ``` >>> >>>> >>>> JBD2_ABORT indicates a fatal error happens, either in jounral layer or >>>> filesystem. And we should not commit any further transactions. >>>> It seems that we may unify the behavior like: >>>> >>>> if (is_journal_aborted(journal)) >>>> return -EROFS; >>>> >>>> >>> IIUC, do you mean add above code in ocfs2_commit_cache or in >>> ocfs2_commit_thread? >> >> Yes, break the loop in ocfs2_commit_thread() in case of journal abort. >> Actually we've handled this case before, but just limit the print. But >> it seems not enough now. > > I wrote in my previous mail. Follow your idea, The code should be: > > if (is_journal_aborted(journal)) { > ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this line is important. > return -EROFS; > } > > Only return -EROFS then break loop in ocfs2_commit_thread() is not enough. > Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then the new > IOs will trigger IO error then trigger RO status. This flow is wrong, we should > mark RO as early as possible when JBD ABORT happens. In my view, the best place > is my [2/2] patch code which in ocfs2_commit_cache(). >Agree, but ocfs2_abort() is more appropriate here, see ocfs2_start_trans(). But ocfs2_abort() may panic system, I'm afraid it has to change to read-only accordingly.>> BTW, the basic rule here is, we don't want to change journal to prevent >> other nodes corrupting the filesystem. > > If my memory is correct, every node has itself special journal partition. > If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed machine, > No other node could corrupting the fs. >I don't think it's a good idea to modify journal even flush fails. Why not catch EROFS in commit thread and break? Now we can only expect umount continues with error, right?
Heming Zhao
2023-May-04 16:20 UTC
[Ocfs2-devel] [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status
On Thu, May 04, 2023 at 05:41:29PM +0800, Joseph Qi wrote:> > > On 5/4/23 4:02 PM, Heming Zhao wrote: > > On Thu, May 04, 2023 at 03:34:49PM +0800, Joseph Qi wrote: > >> > >> > >> On 5/4/23 2:21 PM, Heming Zhao wrote: > >>> On Thu, May 04, 2023 at 10:27:46AM +0800, Joseph Qi wrote: > >>>> > >>>> > >>>> On 4/30/23 11:13 AM, Heming Zhao wrote: > >>>>> fstest generic cases 347 361 628 629 trigger a same issue: > >>>>> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit > >>>>> journal. > >>>> > >>>> What's the end user impact? > >>> > >>> There are two impacts: > >>> > >>> 1. this issue makes unmount command hanging. > >>> > >>> No one likes or accepts filesystem unmount failure when underlying devices meets > >>> error. For comparison, other FSs (e.g. ext4, xfs, gfs2, ..) could do unmount > >>> successfully. > >>> > >> > >> So umount hang should be in the patch description, right? > > > > OK. > > > >> > >>> 2. for developers to verify their patch doesn't make any regression. > >>> > >>> fstest is a famious & important fs testsuite. > >>> (Yes, I know ocfs2 has itself testsuite ocfs2_test.) > >>> > >>> Current status, there are many test cases (about 9 in my env) cause fstest > >>> hanging and blocking fstest to run. I did test for gfs2 on latest tumbleweed, > >>> gfs2 only has 1 hanging case. > >>> > >>> In my view, ocfs2 developers or maintainers at least make ocfs2 to finish the > >>> testsuite. Long-term target is to make ocfs2 to pass all the testsuite. > >>> > >>> On kernel 6.2.9-1, the fstest 'quick' test group result: > >>> (the result based on my two patches [1/2] & [2/2]) > >>> ``` > >>> ./check -g quick -T -b -s ocfs2 -e generic/081 -e generic/648 > >>> > >>> Failures: generic/003 generic/013 generic/015 generic/040 generic/041 > >>> generic/062 generic/082 generic/104 generic/107 generic/228 generic/244 > >>> generic/266 generic/272 generic/277 generic/281 generic/322 generic/329 > >>> generic/331 generic/336 generic/343 generic/376 generic/379 generic/380 > >>> generic/383 generic/384 generic/385 generic/386 generic/400 generic/410 > >>> generic/424 generic/441 generic/448 generic/449 generic/471 generic/479 > >>> generic/493 generic/495 generic/510 generic/537 generic/552 generic/563 > >>> generic/578 generic/594 generic/607 generic/620 generic/628 generic/630 > >>> generic/636 generic/702 > >>> Failed 49 of 568 tests > >>> ``` > >>> > >>>> > >>>> JBD2_ABORT indicates a fatal error happens, either in jounral layer or > >>>> filesystem. And we should not commit any further transactions. > >>>> It seems that we may unify the behavior like: > >>>> > >>>> if (is_journal_aborted(journal)) > >>>> return -EROFS; > >>>> > >>>> > >>> IIUC, do you mean add above code in ocfs2_commit_cache or in > >>> ocfs2_commit_thread? > >> > >> Yes, break the loop in ocfs2_commit_thread() in case of journal abort. > >> Actually we've handled this case before, but just limit the print. But > >> it seems not enough now. > > > > I wrote in my previous mail. Follow your idea, The code should be: > > > > if (is_journal_aborted(journal)) { > > ocfs2_error(osb->sb, "jbd2 status: ABORT.\n"); //this line is important. > > return -EROFS; > > } > > > > Only return -EROFS then break loop in ocfs2_commit_thread() is not enough. > > Without ocfs2_error(), ocfs2 could accept new IOs from userspace, then the new > > IOs will trigger IO error then trigger RO status. This flow is wrong, we should > > mark RO as early as possible when JBD ABORT happens. In my view, the best place > > is my [2/2] patch code which in ocfs2_commit_cache(). > > > > Agree, but ocfs2_abort() is more appropriate here, see > ocfs2_start_trans(). But ocfs2_abort() may panic system, I'm afraid it > has to change to read-only accordingly.You are right. I ever used ocfs2_abort() then got panic. :) So I changed to ocfs2_error(). I also sent two mails in 2023-4-30 (the first mail has mess format by thunderbird). I copy the mail content here: ``` More info for this patch, maybe I should write below info in commit log.>From the comment of __ocfs2_abort(), there is another way to handlejournal ABORT: panic. And fstest generic test NO. 648 will trigger ocfs2_abort then make kernel panic. I don't like the panic style, ocfs2 should elegantly handle journal abnormal case. ``` fstest NO. 648 could trigger ocfs2_abort from ocfs2_start_trans(). So you could find my previous mail, I ran the fstest with parameter '-e generic/648', which means fstest excludes NO. 648. The only different between ocfs2_abort and ocfs2_error is set OCFS2_MOUNT_ERRORS_PANIC, which could trigger panic. If you think panic is brute, I suggest to use ocfs2_error.> > >> BTW, the basic rule here is, we don't want to change journal to prevent > >> other nodes corrupting the filesystem. > > > > If my memory is correct, every node has itself special journal partition. > > If the HA stack assigns the same (before fenced) nodeid to JBD ABORTed machine, > > No other node could corrupting the fs. > > > I don't think it's a good idea to modify journal even flush fails. Why > not catch EROFS in commit thread and break? Now we can only expect > umount continues with error, right? >With your idea, I spent more than two hours for coding & testing. The result is my patch [2/2] is correct. Please believe my patch [2/2]. In patch [2/2], there is 'goto reset' which will resolve below three issues: ocfs2_commit_cache + ... ... reset: //goto label + ocfs2_inc_trans_id(journal); //<1.1> + atomic_set(&journal->j_num_trans, 0); //<2> + ocfs2_wake_downconvert_thread(osb); //<3> + wake_up(&journal->j_checkpointed); //<1.2> 1> trigger hanging umount ... ... ocfs2_clear_inode ocfs2_checkpoint_inode(inode) + ocfs2_ci_fully_checkpointed() //1.1: check '->j_trans_id' + wait_event(osb->journal->j_checkpointed, ...) + 1.2: will wait forever, if commit thread exits. 2> trigger BUG_ON() ocfs2_journal_shutdown + BUG_ON(atomic_read(&(osb->journal->j_num_trans)) != 0); 3> trigger dlm lock convert blocking if there is pending downconvert lock, no one kick it. At last, you could set up a fstest env and verify my writing. - Heming
Apparently Analagous Threads
- [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status
- [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status
- [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status
- [PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status
- [PATCH 1/2] ocfs2: fix missing reset j_num_trans for sync