Heming Zhao
2023-May-01 04:51 UTC
[Ocfs2-devel] [PATCH 1/2] ocfs2: fix missing reset j_num_trans for sync
On Mon, May 01, 2023 at 11:21:15AM +0800, Joseph Qi wrote:> > > On 5/1/23 11:09 AM, Heming Zhao wrote: > > On Mon, May 01, 2023 at 10:52:13AM +0800, Joseph Qi wrote: > >> > >> > >> On 5/1/23 10:26 AM, Heming Zhao wrote: > >>> On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote: > >>>> Hi, > >>>> > >>>> What's the journal status in this case? > >>> jbd2 is marked with JBD2_ABORT > >>> > >>>> I wonder why commit thread is not working, which should flush journal > >>>> and reset j_num_trans during commit cache. > >>> The reason is in my above answer. > >>> > >>> below is the dmesg log for No. 266. > >>> > >>> ``` > >>> [ 595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group... > >>> [ 595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0 > >>> [ 595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 > >>> [ 595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1 > >>> [ 595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes > >>> [ 595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1 > >>> [ 595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory > >>> [ 595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new > >>> [ 595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages > >>> [ 595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms > >>> [ 595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete > >>> [ 595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode. > >>> [ 595.787078] systemd[1]: Started /usr/bin/bash -c exit 77. > >>> [ 595.802650] systemd[1]: fstests-check.scope: Deactivated successfully. > >>> [ 596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group... > >>> [ 596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0 > >>> [ 596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 > >>> [ 596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1 > >>> [ 596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes > >>> [ 596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1 > >>> [ 596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory > >>> [ 596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new > >>> [ 596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages > >>> [ 596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms > >>> [ 596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete > >>> [ 596.224616] kernel: JBD2: Ignoring recovery information on journal > >>> [ 596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode. > >>> [ 596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully. > >>> [ 596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group... > >>> [ 596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0 > >>> [ 596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free > >>> [ 596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0 > >>> [ 596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1) > >>> [ 596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group... > >>> [ 596.905718] systemd[1]: fstest-test.mount: Deactivated successfully. > >>> [ 596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0 > >>> [ 596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free > >>> [ 596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1 > >>> [ 596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1) > >>> [ 596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group... > >>> [ 596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 > >>> [ 596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0 > >>> [ 596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1 > >>> [ 596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes > >>> [ 596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1 > >>> [ 596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory > >>> [ 596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new > >>> [ 596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages > >>> [ 596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms > >>> [ 596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete > >>> [ 596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode. > >>> [ 597.139758] root[5475]: run xfstest generic/266 > >>> [ 596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02 > >>> [ 597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266. > >>> [ 597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group... > >>> [ 597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0 > >>> [ 597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 > >>> [ 597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1 > >>> [ 597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes > >>> [ 597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1 > >>> [ 597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory > >>> [ 597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new > >>> [ 597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages > >>> [ 597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms > >>> [ 597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete > >>> [ 597.590136] kernel: JBD2: Ignoring recovery information on journal > >>> [ 597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode. > >>> [ 598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully. > >>> [ 597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group... > >>> [ 597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0 > >>> [ 598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A > >>> [ 597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free > >>> [ 597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1) > >>> [ 597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group... > >>> [ 597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 > >>> [ 597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0 > >>> [ 597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1 > >>> [ 597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes > >>> [ 597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1 > >>> [ 597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory > >>> [ 597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new > >>> [ 597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages > >>> [ 597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms > >>> [ 597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete > >>> [ 600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots. > >>> [ 600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup") > >>> [ 600.209151] systemd[1]: Starting DBus interface for snapper... > >>> [ 600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper' > >>> [ 600.217370] systemd[1]: Started DBus interface for snapper. > >>> [ 600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully. > >>> [ 599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group... > >>> [ 599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 > >>> [ 599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0 > >>> [ 599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1 > >>> [ 599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes > >>> [ 599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1 > >>> [ 599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory > >>> [ 599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new > >>> [ 599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages > >>> [ 599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms > >>> [ 599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete > >>> [ 599.615320] kernel: JBD2: Ignoring recovery information on journal > >>> [ 599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode. > >>> [ 600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully. > >>> [ 599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group... > >>> [ 599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0 > >>> [ 599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free > >>> [ 600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF > >>> [ 599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1) > >>> [ 599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group... > >>> [ 599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 > >>> [ 599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0 > >>> [ 599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1 > >>> [ 599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes > >>> [ 599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1 > >>> [ 599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory > >>> [ 599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new > >>> [ 599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages > >>> [ 599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms > >>> [ 599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete > >>> [ 599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode. > >>> [ 600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read > >>> [ 600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5 > >>> [ 600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5 > >>> [ 600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5 > >>> [ 600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5 > >>> [ 600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5 > >>> [ 600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5 > >>> [ 600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5 > >>> [ 600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5 > >>> [ 600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5 > >>> [ 600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5 > >>> [ 600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5 > >>> [ 600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5 > >>> [ 600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5 > >>> [ 600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5 > >>> [ 600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5 > >>> [ 600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5 > >>> [ 600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5 > >>> [ 600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5 > >>> [ 600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5 > >>> [ 600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5 > >>> [ 600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5 > >>> [ 600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5 > >>> [ 600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully. > >>> [ 600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write > >>> [ 600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24. > >>> [ 600.127615] kernel: Aborting journal on device dm-0-24. > >>> [ 600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write > >>> [ 600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24. > >>> [ 600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted. > >>> [ 601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted. > >>> [ 616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> [ 620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 > >>> ^C > >>> tb-fstest1:/usr/src/linux/fs/ocfs2 # > >>> ``` > >> > >> So the case is, flush journal in commit thread will never end if journal > >> is abort, and then lead to umount hang since commit thread won't stop, > >> right? > > Yes. > > > >> > >> If so, I don't think your changes in sync fs is a right solution. It > >> seems it just for bypassing the testcase. > > No. > > In my view, No. 266 shouldn't trigger jbd2 ABORT status. The 'sync' should > > tidy up all already issued write IOs. After sync & load dm-error, new write IOs > > fails on rw IOs, which shouldn't generate any journal IOs and shouldn't trigger > > jbd2 to enter ABORT status. > > > > What if there is no 'sync' in this case?It is same with issuing 'sync' case. Please differ: 'after [ 600.079285] in above log' vs. 'after [ 207.623776] in below log' (The output of 'journalctl -f' when comment out 'sync' in NO. 266) ``` [ 204.125234] root[2136]: run xfstest generic/266 [ 203.943897] unknown: run fstests generic/266 at 2023-05-01 12:40:37 [ 204.179159] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266. [ 205.000897] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: joining the lockspace group... [ 205.006914] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover 1 [ 205.006919] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: group event done 0 [ 205.007771] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: add member 1 [ 205.008251] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_members 1 nodes [ 205.008641] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: generation 1 slots 1 1:1 [ 205.009043] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_directory [ 205.009305] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_directory 0 in 0 new [ 205.009640] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_directory 0 out 0 messages [ 205.010061] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover 1 generation 1 done: 0 ms [ 205.010399] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: join complete [ 205.016329] kernel: JBD2: Ignoring recovery information on journal [ 205.020398] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode. [ 205.213841] systemd[1]: fstest-scratch.mount: Deactivated successfully. [ 205.218581] ocfs2_hb_ctl[2342]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F8E488BAE771460B89ABD355D42B0264 [ 205.033443] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: leaving the lockspace group... [ 205.034046] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: group event done 0 [ 205.034296] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: release_lockspace final free [ 205.040937] kernel: ocfs2: Unmounting device (253,32) on (node 1) [ 205.066896] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: joining the lockspace group... [ 205.077350] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover 1 [ 205.077419] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: group event done 0 [ 205.078070] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: add member 1 [ 205.078298] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_members 1 nodes [ 205.078579] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: generation 1 slots 1 1:1 [ 205.078823] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_directory [ 205.079081] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_directory 0 in 0 new [ 205.079276] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover_directory 0 out 0 messages [ 205.079493] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: dlm_recover 1 generation 1 done: 0 ms [ 205.079728] kernel: dlm: F8E488BAE771460B89ABD355D42B0264: join complete [ 207.150959] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: joining the lockspace group... [ 207.154653] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: group event done 0 [ 207.154724] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover 1 [ 207.155282] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: add member 1 [ 207.155587] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_members 1 nodes [ 207.155935] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: generation 1 slots 1 1:1 [ 207.156220] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_directory [ 207.156584] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_directory 0 in 0 new [ 207.156860] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_directory 0 out 0 messages [ 207.157130] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover 1 generation 1 done: 0 ms [ 207.157453] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: join complete [ 207.161303] kernel: JBD2: Ignoring recovery information on journal [ 207.165650] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode. [ 207.492458] systemd[1]: fstest-scratch.mount: Deactivated successfully. [ 207.334595] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: leaving the lockspace group... [ 207.338685] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: group event done 0 [ 207.342262] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: release_lockspace final free [ 207.530465] ocfs2_hb_ctl[2410]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 0304961B816D4F5CA1E793BD4655E027 [ 207.365106] kernel: ocfs2: Unmounting device (254,0) on (node 1) [ 207.394907] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: joining the lockspace group... [ 207.408281] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover 1 [ 207.408344] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: group event done 0 [ 207.409397] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: add member 1 [ 207.409826] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_members 1 nodes [ 207.410239] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: generation 1 slots 1 1:1 [ 207.410699] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_directory [ 207.411147] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_directory 0 in 0 new [ 207.411579] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover_directory 0 out 0 messages [ 207.412579] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: dlm_recover 1 generation 1 done: 4 ms [ 207.413093] kernel: dlm: 0304961B816D4F5CA1E793BD4655E027: join complete [ 207.420774] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode. [ 207.623776] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read [ 207.627307] kernel: (xfs_io,2432,2):ocfs2_inode_lock_update:2354 ERROR: status = -5 [ 207.627487] kernel: (xfs_io,2432,2):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5 [ 207.627635] kernel: (xfs_io,2432,2):ocfs2_statfs:1654 ERROR: status = -5 [ 207.627771] kernel: (xfs_io,2432,2):ocfs2_statfs:1686 ERROR: status = -5 [ 207.628009] kernel: (xfs_io,2432,0):ocfs2_inode_lock_update:2354 ERROR: status = -5 [ 207.628275] kernel: (xfs_io,2432,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5 [ 207.628521] kernel: (xfs_io,2432,0):ocfs2_statfs:1654 ERROR: status = -5 [ 207.628786] kernel: (xfs_io,2432,0):ocfs2_statfs:1686 ERROR: status = -5 [ 207.629079] kernel: (xfs_io,2432,0):ocfs2_inode_lock_update:2354 ERROR: status = -5 [ 207.629356] kernel: (xfs_io,2432,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5 [ 207.629628] kernel: (xfs_io,2432,0):ocfs2_statfs:1654 ERROR: status = -5 [ 207.629868] kernel: (xfs_io,2432,0):ocfs2_statfs:1686 ERROR: status = -5 [ 207.632449] kernel: (xfs_io,2432,0):ocfs2_get_refcount_tree:358 ERROR: status = -5 [ 207.632807] kernel: (xfs_io,2432,0):ocfs2_lock_refcount_tree:456 ERROR: status = -5 [ 207.633186] kernel: (xfs_io,2432,0):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5 [ 207.633467] kernel: (xfs_io,2432,0):ocfs2_refcount_cow:3478 ERROR: status = -5 [ 207.633736] kernel: (xfs_io,2432,0):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5 [ 207.634022] kernel: (xfs_io,2432,0):ocfs2_file_write_iter:2433 ERROR: status = -5 [ 207.634437] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write [ 207.634636] kernel: JBD2: I/O error when updating journal superblock for dm-0-24. [ 207.634788] kernel: Aborting journal on device dm-0-24. [ 207.634942] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write [ 207.635097] kernel: JBD2: I/O error when updating journal superblock for dm-0-24. [ 207.635339] kernel: (xfs_io,2432,0):ocfs2_sync_file:201 ERROR: status = -5 [ 207.638738] kernel: (umount,2436,2):ocfs2_inode_lock_update:2354 ERROR: status = -5 [ 207.638939] kernel: (umount,2436,2):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5 [ 207.639112] kernel: (umount,2436,2):ocfs2_statfs:1654 ERROR: status = -5 [ 207.639281] kernel: (umount,2436,2):ocfs2_statfs:1686 ERROR: status = -5 [ 207.823818] systemd[1]: fstest-scratch.mount: Deactivated successfully. [ 207.642501] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 207.642689] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted. [ 208.648030] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 209.672049] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 210.696057] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 211.720064] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 212.744068] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 213.768081] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 214.792079] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 215.816079] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 216.840090] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 217.863993] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 218.888057] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 219.912099] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 220.936106] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 221.960112] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 222.984116] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 222.984857] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted. [ 224.008130] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 225.032123] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 [ 226.056075] kernel: (ocfs2cmt-030496,2419,0):ocfs2_commit_cache:321 ERROR: status = -5 ^C ``` - Heming
Joseph Qi
2023-May-01 11:44 UTC
[Ocfs2-devel] [PATCH 1/2] ocfs2: fix missing reset j_num_trans for sync
On 5/1/23 12:51 PM, Heming Zhao wrote:> On Mon, May 01, 2023 at 11:21:15AM +0800, Joseph Qi wrote: >> >> >> On 5/1/23 11:09 AM, Heming Zhao wrote: >>> On Mon, May 01, 2023 at 10:52:13AM +0800, Joseph Qi wrote: >>>> >>>> >>>> On 5/1/23 10:26 AM, Heming Zhao wrote: >>>>> On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote: >>>>>> Hi, >>>>>> >>>>>> What's the journal status in this case? >>>>> jbd2 is marked with JBD2_ABORT >>>>> >>>>>> I wonder why commit thread is not working, which should flush journal >>>>>> and reset j_num_trans during commit cache. >>>>> The reason is in my above answer. >>>>> >>>>> below is the dmesg log for No. 266. >>>>> >>>>> ``` >>>>> [ 595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group... >>>>> [ 595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0 >>>>> [ 595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 >>>>> [ 595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1 >>>>> [ 595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes >>>>> [ 595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1 >>>>> [ 595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory >>>>> [ 595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new >>>>> [ 595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages >>>>> [ 595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms >>>>> [ 595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete >>>>> [ 595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode. >>>>> [ 595.787078] systemd[1]: Started /usr/bin/bash -c exit 77. >>>>> [ 595.802650] systemd[1]: fstests-check.scope: Deactivated successfully. >>>>> [ 596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group... >>>>> [ 596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0 >>>>> [ 596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 >>>>> [ 596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1 >>>>> [ 596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes >>>>> [ 596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1 >>>>> [ 596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory >>>>> [ 596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new >>>>> [ 596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages >>>>> [ 596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms >>>>> [ 596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete >>>>> [ 596.224616] kernel: JBD2: Ignoring recovery information on journal >>>>> [ 596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode. >>>>> [ 596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully. >>>>> [ 596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group... >>>>> [ 596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0 >>>>> [ 596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free >>>>> [ 596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0 >>>>> [ 596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1) >>>>> [ 596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group... >>>>> [ 596.905718] systemd[1]: fstest-test.mount: Deactivated successfully. >>>>> [ 596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0 >>>>> [ 596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free >>>>> [ 596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1 >>>>> [ 596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1) >>>>> [ 596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group... >>>>> [ 596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 >>>>> [ 596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0 >>>>> [ 596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1 >>>>> [ 596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes >>>>> [ 596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1 >>>>> [ 596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory >>>>> [ 596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new >>>>> [ 596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages >>>>> [ 596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms >>>>> [ 596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete >>>>> [ 596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode. >>>>> [ 597.139758] root[5475]: run xfstest generic/266 >>>>> [ 596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02 >>>>> [ 597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266. >>>>> [ 597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group... >>>>> [ 597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0 >>>>> [ 597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 >>>>> [ 597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1 >>>>> [ 597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes >>>>> [ 597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1 >>>>> [ 597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory >>>>> [ 597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new >>>>> [ 597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages >>>>> [ 597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms >>>>> [ 597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete >>>>> [ 597.590136] kernel: JBD2: Ignoring recovery information on journal >>>>> [ 597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode. >>>>> [ 598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully. >>>>> [ 597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group... >>>>> [ 597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0 >>>>> [ 598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A >>>>> [ 597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free >>>>> [ 597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1) >>>>> [ 597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group... >>>>> [ 597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 >>>>> [ 597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0 >>>>> [ 597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1 >>>>> [ 597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes >>>>> [ 597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1 >>>>> [ 597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory >>>>> [ 597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new >>>>> [ 597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages >>>>> [ 597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms >>>>> [ 597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete >>>>> [ 600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots. >>>>> [ 600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup") >>>>> [ 600.209151] systemd[1]: Starting DBus interface for snapper... >>>>> [ 600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper' >>>>> [ 600.217370] systemd[1]: Started DBus interface for snapper. >>>>> [ 600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully. >>>>> [ 599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group... >>>>> [ 599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 >>>>> [ 599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0 >>>>> [ 599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1 >>>>> [ 599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes >>>>> [ 599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1 >>>>> [ 599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory >>>>> [ 599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new >>>>> [ 599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages >>>>> [ 599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms >>>>> [ 599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete >>>>> [ 599.615320] kernel: JBD2: Ignoring recovery information on journal >>>>> [ 599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode. >>>>> [ 600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully. >>>>> [ 599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group... >>>>> [ 599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0 >>>>> [ 599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free >>>>> [ 600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF >>>>> [ 599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1) >>>>> [ 599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group... >>>>> [ 599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 >>>>> [ 599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0 >>>>> [ 599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1 >>>>> [ 599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes >>>>> [ 599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1 >>>>> [ 599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory >>>>> [ 599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new >>>>> [ 599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages >>>>> [ 599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms >>>>> [ 599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete >>>>> [ 599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode. >>>>> [ 600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read >>>>> [ 600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5 >>>>> [ 600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5 >>>>> [ 600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5 >>>>> [ 600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5 >>>>> [ 600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5 >>>>> [ 600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5 >>>>> [ 600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5 >>>>> [ 600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5 >>>>> [ 600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5 >>>>> [ 600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5 >>>>> [ 600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5 >>>>> [ 600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5 >>>>> [ 600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5 >>>>> [ 600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5 >>>>> [ 600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5 >>>>> [ 600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5 >>>>> [ 600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5 >>>>> [ 600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5 >>>>> [ 600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5 >>>>> [ 600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5 >>>>> [ 600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5 >>>>> [ 600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5 >>>>> [ 600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully. >>>>> [ 600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write >>>>> [ 600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24. >>>>> [ 600.127615] kernel: Aborting journal on device dm-0-24. >>>>> [ 600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write >>>>> [ 600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24. >>>>> [ 600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted. >>>>> [ 601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted. >>>>> [ 616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> [ 620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5 >>>>> ^C >>>>> tb-fstest1:/usr/src/linux/fs/ocfs2 # >>>>> ``` >>>> >>>> So the case is, flush journal in commit thread will never end if journal >>>> is abort, and then lead to umount hang since commit thread won't stop, >>>> right? >>> Yes. >>> >>>> >>>> If so, I don't think your changes in sync fs is a right solution. It >>>> seems it just for bypassing the testcase. >>> No. >>> In my view, No. 266 shouldn't trigger jbd2 ABORT status. The 'sync' should >>> tidy up all already issued write IOs. After sync & load dm-error, new write IOs >>> fails on rw IOs, which shouldn't generate any journal IOs and shouldn't trigger >>> jbd2 to enter ABORT status. >>> >> >> What if there is no 'sync' in this case? > > It is same with issuing 'sync' case. Please differ: > 'after [ 600.079285] in above log' vs. 'after [ 207.623776] in below log' >I mean your solution is to reset j_num_trans in sync fs, but in case ofno 'sync', how do you resolve it?