thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH] ocfs2: fix a null pointer derefrence in ocfs2_block_group_clear

If this information is useful, please help other people find it:
Share via:

lishan

2020-Mar-09 08:26 UTC

[Ocfs2-devel] [PATCH] ocfs2: fix a null pointer derefrence in ocfs2_block_group_clear_bits()

A NULL pointer panic dereference in ocfs2_block_group_clear_bits() happen again.
The information of NULL pointer stack as follows:

PID: 81866  TASK: ffffa07c3c21ae80  CPU: 66  COMMAND: "fallocate"
  #0 [ffff0000b4d6b0b0] machine_kexec at ffff0000800a2954
  #1 [ffff0000b4d6b110] __crash_kexec at ffff0000801bab34
  #2 [ffff0000b4d6b2a0] panic at ffff0000800f02cc
  #3 [ffff0000b4d6b380] die at ffff00008008f6ac
  #4 [ffff0000b4d6b3d0] bug_handler at ffff00008008f744
  #5 [ffff0000b4d6b400] brk_handler at ffff000080085d1c
  #6 [ffff0000b4d6b420] do_debug_exception at ffff000080081194
  #7 [ffff0000b4d6b630] el1_dbg at ffff00008008332c
      PC: ffff00000190e9c0  [_ocfs2_free_suballoc_bits+1608]
      LR: ffff00000190e990  [_ocfs2_free_suballoc_bits+1560]
      SP: ffff0000b4d6b640  PSTATE: 60400009
     X29: ffff0000b4d6b650  X28: 0000000000000000  X27: 00000000000052f3
     X26: ffff807c511a9570  X25: ffff807ca0054000  X24: 00000000000052f2
     X23: 0000000000000001  X22: ffff807c7cde7a90  X21: ffff0000811d9000
     X20: ffff807c5e7d2000  X19: ffff00000190c768  X18: 0000000000000000
     X17: 0000000000000000  X16: ffff000080a032f0  X15: 0000000000000000
     X14: ffffffffffffffff  X13: fffffffffffffff7  X12: ffffffffffffffff
     X11: 0000000000000038  X10: 0101010101010101   X9: ffffffffffffffff
      X8: 7f7f7f7f7f7f7f7f   X7: 0000000000000000   X6: 0000000000000080
      X5: 0000000000000000   X4: 0000000000000002   X3: ffff00000199f390
      X2: a603c08321456e00   X1: ffff807c7cde7a90   X0: 0000000000000000
  #8 [ffff0000b4d6b650] _ocfs2_free_suballoc_bits at ffff00000190e9bc [ocfs2]
  #9 [ffff0000b4d6b710] _ocfs2_free_clusters at ffff0000019110d4 [ocfs2]
 #10 [ffff0000b4d6b790] ocfs2_free_clusters at ffff000001913e94 [ocfs2]
 #11 [ffff0000b4d6b7d0] __ocfs2_flush_truncate_log at ffff0000018b5294 [ocfs2]
 #12 [ffff0000b4d6b8a0] ocfs2_remove_btree_range at ffff0000018bb34c [ocfs2]
 #13 [ffff0000b4d6b960] ocfs2_commit_truncate at ffff0000018bc76c [ocfs2]
 #14 [ffff0000b4d6ba60] ocfs2_wipe_inode at ffff0000018e57bc [ocfs2]
 #15 [ffff0000b4d6bb00] ocfs2_evict_inode at ffff0000018e5db8 [ocfs2]
 #16 [ffff0000b4d6bb70] evict at ffff000080365040
 #17 [ffff0000b4d6bba0] iput at ffff0000803655d8
 #18 [ffff0000b4d6bbe0] ocfs2_dentry_iput at ffff0000018c60a0 [ocfs2]
 #19 [ffff0000b4d6bc30] dentry_unlink_inode at ffff00008035ef58
 #20 [ffff0000b4d6bc50] __dentry_kill at ffff000080360384
 #21 [ffff0000b4d6bc80] dentry_kill at ffff000080360670
 #22 [ffff0000b4d6bcb0] dput at ffff00008036093c
 #23 [ffff0000b4d6bcf0] __fput at ffff000080343930
 #24 [ffff0000b4d6bd40] ____fput at ffff000080343aac
 #25 [ffff0000b4d6bd60] task_work_run at ffff0000801172fc

The direct panic reason is that bh2jh (group_bh)-> b_committed_data is null.
It is presumed that the network was disconnected during the write process,
causing the transaction abort. as follows:
jbd2_journal_abort
  .......
  jbd2_journal_commit_transaction
    jh->b_committed_data = NULL;

_ocfs2_free_suballoc_bits
  ocfs2_block_group_clear_bits
    // undo_bg is now set to null
    BUG_ON(!undo_bg);

When applying for free space, if b_committed_data is null,
it will be directly occupied, as follows:
ocfs2_cluster_group_search
  ocfs2_block_group_find_clear_bits
    ocfs2_test_bg_bit_allocatable:
      bg = (struct ocfs2_group_desc *) bh2jh(bg_bh)->b_committed_data;
      if (bg)
        ret = !ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap);
      else
        ret = 1;
b_committed_data is an intermediate state backup for bitmap transaction commits,
newly applied space can overwrite previous dirty data,
so, I think, while free clusters, if b_committed_data is null, ignore it.
Host panic directly, too violent.

Signed-off-by: Shan Li <lishan24 at huawei.com>
Reviewed-by: Jun Piao <piaojun at huawei.com>
---
 fs/ocfs2/suballoc.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 939df99d2dec..aaf1b3cbd984 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -2412,14 +2412,19 @@ static int ocfs2_block_group_clear_bits(handle_t
*handle,
 	if (undo_fn) {
 		spin_lock(&jh->b_state_lock);
 		undo_bg = (struct ocfs2_group_desc *) jh->b_committed_data;
-		BUG_ON(!undo_bg);
+		if (!undo_bg)
+			mlog(ML_NOTICE, "%s: group descriptor # %llu (device %s) journal "
+					"b_committed_data had been cleared.\n",
+					OCFS2_SB(alloc_inode->i_sb)->uuid_str,
+					(unsigned long long)le64_to_cpu(bg->bg_blkno),
+					alloc_inode->i_sb->s_id);
 	}

 	tmp = num_bits;
 	while(tmp--) {
 		ocfs2_clear_bit((bit_off + tmp),
 				(unsigned long *) bg->bg_bitmap);
-		if (undo_fn)
+		if (undo_fn && undo_bg)
 			undo_fn(bit_off + tmp,
 				(unsigned long *) undo_bg->bg_bitmap);
 	}
-- 
2.19.1

lishan

2020-Mar-10 13:07 UTC

head link

[Ocfs2-devel] [PATCH] ocfs2: fix a null pointer derefrence in ocfs2_block_group_clear_bits()

ping ?
cc Andrew Morton

On 2020/3/9 16:26, lishan wrote:> A NULL pointer panic dereference in ocfs2_block_group_clear_bits() happen
again.
> The information of NULL pointer stack as follows:
> 
> PID: 81866  TASK: ffffa07c3c21ae80  CPU: 66  COMMAND: "fallocate"
>   #0 [ffff0000b4d6b0b0] machine_kexec at ffff0000800a2954
>   #1 [ffff0000b4d6b110] __crash_kexec at ffff0000801bab34
>   #2 [ffff0000b4d6b2a0] panic at ffff0000800f02cc
>   #3 [ffff0000b4d6b380] die at ffff00008008f6ac
>   #4 [ffff0000b4d6b3d0] bug_handler at ffff00008008f744
>   #5 [ffff0000b4d6b400] brk_handler at ffff000080085d1c
>   #6 [ffff0000b4d6b420] do_debug_exception at ffff000080081194
>   #7 [ffff0000b4d6b630] el1_dbg at ffff00008008332c
>       PC: ffff00000190e9c0  [_ocfs2_free_suballoc_bits+1608]
>       LR: ffff00000190e990  [_ocfs2_free_suballoc_bits+1560]
>       SP: ffff0000b4d6b640  PSTATE: 60400009
>      X29: ffff0000b4d6b650  X28: 0000000000000000  X27: 00000000000052f3
>      X26: ffff807c511a9570  X25: ffff807ca0054000  X24: 00000000000052f2
>      X23: 0000000000000001  X22: ffff807c7cde7a90  X21: ffff0000811d9000
>      X20: ffff807c5e7d2000  X19: ffff00000190c768  X18: 0000000000000000
>      X17: 0000000000000000  X16: ffff000080a032f0  X15: 0000000000000000
>      X14: ffffffffffffffff  X13: fffffffffffffff7  X12: ffffffffffffffff
>      X11: 0000000000000038  X10: 0101010101010101   X9: ffffffffffffffff
>       X8: 7f7f7f7f7f7f7f7f   X7: 0000000000000000   X6: 0000000000000080
>       X5: 0000000000000000   X4: 0000000000000002   X3: ffff00000199f390
>       X2: a603c08321456e00   X1: ffff807c7cde7a90   X0: 0000000000000000
>   #8 [ffff0000b4d6b650] _ocfs2_free_suballoc_bits at ffff00000190e9bc
[ocfs2]
>   #9 [ffff0000b4d6b710] _ocfs2_free_clusters at ffff0000019110d4 [ocfs2]
>  #10 [ffff0000b4d6b790] ocfs2_free_clusters at ffff000001913e94 [ocfs2]
>  #11 [ffff0000b4d6b7d0] __ocfs2_flush_truncate_log at ffff0000018b5294
[ocfs2]
>  #12 [ffff0000b4d6b8a0] ocfs2_remove_btree_range at ffff0000018bb34c
[ocfs2]
>  #13 [ffff0000b4d6b960] ocfs2_commit_truncate at ffff0000018bc76c [ocfs2]
>  #14 [ffff0000b4d6ba60] ocfs2_wipe_inode at ffff0000018e57bc [ocfs2]
>  #15 [ffff0000b4d6bb00] ocfs2_evict_inode at ffff0000018e5db8 [ocfs2]
>  #16 [ffff0000b4d6bb70] evict at ffff000080365040
>  #17 [ffff0000b4d6bba0] iput at ffff0000803655d8
>  #18 [ffff0000b4d6bbe0] ocfs2_dentry_iput at ffff0000018c60a0 [ocfs2]
>  #19 [ffff0000b4d6bc30] dentry_unlink_inode at ffff00008035ef58
>  #20 [ffff0000b4d6bc50] __dentry_kill at ffff000080360384
>  #21 [ffff0000b4d6bc80] dentry_kill at ffff000080360670
>  #22 [ffff0000b4d6bcb0] dput at ffff00008036093c
>  #23 [ffff0000b4d6bcf0] __fput at ffff000080343930
>  #24 [ffff0000b4d6bd40] ____fput at ffff000080343aac
>  #25 [ffff0000b4d6bd60] task_work_run at ffff0000801172fc
> 
> The direct panic reason is that bh2jh (group_bh)-> b_committed_data is
null.
> It is presumed that the network was disconnected during the write process,
> causing the transaction abort. as follows:
> jbd2_journal_abort
>   .......
>   jbd2_journal_commit_transaction
>     jh->b_committed_data = NULL;
> 
> _ocfs2_free_suballoc_bits
>   ocfs2_block_group_clear_bits
>     // undo_bg is now set to null
>     BUG_ON(!undo_bg);
> 
> When applying for free space, if b_committed_data is null,
> it will be directly occupied, as follows:
> ocfs2_cluster_group_search
>   ocfs2_block_group_find_clear_bits
>     ocfs2_test_bg_bit_allocatable:
>       bg = (struct ocfs2_group_desc *) bh2jh(bg_bh)->b_committed_data;
>       if (bg)
>         ret = !ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap);
>       else
>         ret = 1;
> b_committed_data is an intermediate state backup for bitmap transaction
commits,
> newly applied space can overwrite previous dirty data,
> so, I think, while free clusters, if b_committed_data is null, ignore it.
> Host panic directly, too violent.
> 
> Signed-off-by: Shan Li <lishan24 at huawei.com>
> Reviewed-by: Jun Piao <piaojun at huawei.com>
> ---
>  fs/ocfs2/suballoc.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index 939df99d2dec..aaf1b3cbd984 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -2412,14 +2412,19 @@ static int ocfs2_block_group_clear_bits(handle_t
*handle,
>  	if (undo_fn) {
>  		spin_lock(&jh->b_state_lock);
>  		undo_bg = (struct ocfs2_group_desc *) jh->b_committed_data;
> -		BUG_ON(!undo_bg);
> +		if (!undo_bg)
> +			mlog(ML_NOTICE, "%s: group descriptor # %llu (device %s) journal
"
> +					"b_committed_data had been cleared.\n",
> +					OCFS2_SB(alloc_inode->i_sb)->uuid_str,
> +					(unsigned long long)le64_to_cpu(bg->bg_blkno),
> +					alloc_inode->i_sb->s_id);
>  	}
> 
>  	tmp = num_bits;
>  	while(tmp--) {
>  		ocfs2_clear_bit((bit_off + tmp),
>  				(unsigned long *) bg->bg_bitmap);
> -		if (undo_fn)
> +		if (undo_fn && undo_bg)
>  			undo_fn(bit_off + tmp,
>  				(unsigned long *) undo_bg->bg_bitmap);
>  	}
>

Joseph Qi

2020-Mar-11 01:15 UTC

head link

[Ocfs2-devel] [PATCH] ocfs2: fix a null pointer derefrence in ocfs2_block_group_clear_bits()

On 2020/3/9 16:26, lishan wrote:> A NULL pointer panic dereference in ocfs2_block_group_clear_bits() happen
again.
> The information of NULL pointer stack as follows:
> 
> PID: 81866  TASK: ffffa07c3c21ae80  CPU: 66  COMMAND: "fallocate"
>   #0 [ffff0000b4d6b0b0] machine_kexec at ffff0000800a2954
>   #1 [ffff0000b4d6b110] __crash_kexec at ffff0000801bab34
>   #2 [ffff0000b4d6b2a0] panic at ffff0000800f02cc
>   #3 [ffff0000b4d6b380] die at ffff00008008f6ac
>   #4 [ffff0000b4d6b3d0] bug_handler at ffff00008008f744
>   #5 [ffff0000b4d6b400] brk_handler at ffff000080085d1c
>   #6 [ffff0000b4d6b420] do_debug_exception at ffff000080081194
>   #7 [ffff0000b4d6b630] el1_dbg at ffff00008008332c
>       PC: ffff00000190e9c0  [_ocfs2_free_suballoc_bits+1608]
>       LR: ffff00000190e990  [_ocfs2_free_suballoc_bits+1560]
>       SP: ffff0000b4d6b640  PSTATE: 60400009
>      X29: ffff0000b4d6b650  X28: 0000000000000000  X27: 00000000000052f3
>      X26: ffff807c511a9570  X25: ffff807ca0054000  X24: 00000000000052f2
>      X23: 0000000000000001  X22: ffff807c7cde7a90  X21: ffff0000811d9000
>      X20: ffff807c5e7d2000  X19: ffff00000190c768  X18: 0000000000000000
>      X17: 0000000000000000  X16: ffff000080a032f0  X15: 0000000000000000
>      X14: ffffffffffffffff  X13: fffffffffffffff7  X12: ffffffffffffffff
>      X11: 0000000000000038  X10: 0101010101010101   X9: ffffffffffffffff
>       X8: 7f7f7f7f7f7f7f7f   X7: 0000000000000000   X6: 0000000000000080
>       X5: 0000000000000000   X4: 0000000000000002   X3: ffff00000199f390
>       X2: a603c08321456e00   X1: ffff807c7cde7a90   X0: 0000000000000000
>   #8 [ffff0000b4d6b650] _ocfs2_free_suballoc_bits at ffff00000190e9bc
[ocfs2]
>   #9 [ffff0000b4d6b710] _ocfs2_free_clusters at ffff0000019110d4 [ocfs2]
>  #10 [ffff0000b4d6b790] ocfs2_free_clusters at ffff000001913e94 [ocfs2]
>  #11 [ffff0000b4d6b7d0] __ocfs2_flush_truncate_log at ffff0000018b5294
[ocfs2]
>  #12 [ffff0000b4d6b8a0] ocfs2_remove_btree_range at ffff0000018bb34c
[ocfs2]
>  #13 [ffff0000b4d6b960] ocfs2_commit_truncate at ffff0000018bc76c [ocfs2]
>  #14 [ffff0000b4d6ba60] ocfs2_wipe_inode at ffff0000018e57bc [ocfs2]
>  #15 [ffff0000b4d6bb00] ocfs2_evict_inode at ffff0000018e5db8 [ocfs2]
>  #16 [ffff0000b4d6bb70] evict at ffff000080365040
>  #17 [ffff0000b4d6bba0] iput at ffff0000803655d8
>  #18 [ffff0000b4d6bbe0] ocfs2_dentry_iput at ffff0000018c60a0 [ocfs2]
>  #19 [ffff0000b4d6bc30] dentry_unlink_inode at ffff00008035ef58
>  #20 [ffff0000b4d6bc50] __dentry_kill at ffff000080360384
>  #21 [ffff0000b4d6bc80] dentry_kill at ffff000080360670
>  #22 [ffff0000b4d6bcb0] dput at ffff00008036093c
>  #23 [ffff0000b4d6bcf0] __fput at ffff000080343930
>  #24 [ffff0000b4d6bd40] ____fput at ffff000080343aac
>  #25 [ffff0000b4d6bd60] task_work_run at ffff0000801172fc
> 
> The direct panic reason is that bh2jh (group_bh)-> b_committed_data is
null.
> It is presumed that the network was disconnected during the write process,
> causing the transaction abort. as follows:
> jbd2_journal_abort
>   .......
>   jbd2_journal_commit_transaction
>     jh->b_committed_data = NULL;
> 
> _ocfs2_free_suballoc_bits
>   ocfs2_block_group_clear_bits
>     // undo_bg is now set to null
>     BUG_ON(!undo_bg);
> 
> When applying for free space, if b_committed_data is null,
> it will be directly occupied, as follows:
> ocfs2_cluster_group_search
>   ocfs2_block_group_find_clear_bits
>     ocfs2_test_bg_bit_allocatable:
>       bg = (struct ocfs2_group_desc *) bh2jh(bg_bh)->b_committed_data;
>       if (bg)
>         ret = !ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap);
>       else
>         ret = 1;
> b_committed_data is an intermediate state backup for bitmap transaction
commits,
> newly applied space can overwrite previous dirty data,
> so, I think, while free clusters, if b_committed_data is null, ignore it.
> Host panic directly, too violent.
> 
> Signed-off-by: Shan Li <lishan24 at huawei.com>
> Reviewed-by: Jun Piao <piaojun at huawei.com>
> ---
>  fs/ocfs2/suballoc.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index 939df99d2dec..aaf1b3cbd984 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -2412,14 +2412,19 @@ static int ocfs2_block_group_clear_bits(handle_t
*handle,
>  	if (undo_fn) {
>  		spin_lock(&jh->b_state_lock);
>  		undo_bg = (struct ocfs2_group_desc *) jh->b_committed_data;
> -		BUG_ON(!undo_bg);
> +		if (!undo_bg)
> +			mlog(ML_NOTICE, "%s: group descriptor # %llu (device %s) journal
"
> +					"b_committed_data had been cleared.\n",
> +					OCFS2_SB(alloc_inode->i_sb)->uuid_str,
> +					(unsigned long long)le64_to_cpu(bg->bg_blkno),
> +					alloc_inode->i_sb->s_id);
Seems a kind of workaround.
I am worrying about other abnormal cases of NULL b_committed_data, it
may lead to a corrupt filesystem.
So how about isolating the journal abort case?

Thanks,
Joseph
>  	}
> 
>  	tmp = num_bits;
>  	while(tmp--) {
>  		ocfs2_clear_bit((bit_off + tmp),
>  				(unsigned long *) bg->bg_bitmap);
> -		if (undo_fn)
> +		if (undo_fn && undo_bg)
>  			undo_fn(bit_off + tmp,
>  				(unsigned long *) undo_bg->bg_bitmap);
>  	}
>

Andrew Morton

2020-Mar-12 01:40 UTC

head link

[Ocfs2-devel] [PATCH] ocfs2: fix a null pointer derefrence in ocfs2_block_group_clear_bits()

On Tue, 10 Mar 2020 21:07:06 +0800 lishan <lishan24 at huawei.com> wrote:
> On 2020/3/9 16:26, lishan wrote:
> > A NULL pointer panic dereference in ocfs2_block_group_clear_bits()
happen again.
> > The information of NULL pointer stack as follows:
> > 
> > PID: 81866  TASK: ffffa07c3c21ae80  CPU: 66  COMMAND:
"fallocate"
> >   #0 [ffff0000b4d6b0b0] machine_kexec at ffff0000800a2954
> >   #1 [ffff0000b4d6b110] __crash_kexec at ffff0000801bab34
> >   #2 [ffff0000b4d6b2a0] panic at ffff0000800f02cc
> >   #3 [ffff0000b4d6b380] die at ffff00008008f6ac
> >   #4 [ffff0000b4d6b3d0] bug_handler at ffff00008008f744
> >   #5 [ffff0000b4d6b400] brk_handler at ffff000080085d1c
> >   #6 [ffff0000b4d6b420] do_debug_exception at ffff000080081194
> >   #7 [ffff0000b4d6b630] el1_dbg at ffff00008008332c
> >       PC: ffff00000190e9c0  [_ocfs2_free_suballoc_bits+1608]
> >       LR: ffff00000190e990  [_ocfs2_free_suballoc_bits+1560]
> >       SP: ffff0000b4d6b640  PSTATE: 60400009
> >      X29: ffff0000b4d6b650  X28: 0000000000000000  X27:
00000000000052f3
> >      X26: ffff807c511a9570  X25: ffff807ca0054000  X24:
00000000000052f2
> >      X23: 0000000000000001  X22: ffff807c7cde7a90  X21:
ffff0000811d9000
> >      X20: ffff807c5e7d2000  X19: ffff00000190c768  X18:
0000000000000000
> >      X17: 0000000000000000  X16: ffff000080a032f0  X15:
0000000000000000
> >      X14: ffffffffffffffff  X13: fffffffffffffff7  X12:
ffffffffffffffff
> >      X11: 0000000000000038  X10: 0101010101010101   X9:
ffffffffffffffff
> >       X8: 7f7f7f7f7f7f7f7f   X7: 0000000000000000   X6:
0000000000000080
> >       X5: 0000000000000000   X4: 0000000000000002   X3:
ffff00000199f390
> >       X2: a603c08321456e00   X1: ffff807c7cde7a90   X0:
0000000000000000
> >   #8 [ffff0000b4d6b650] _ocfs2_free_suballoc_bits at ffff00000190e9bc
[ocfs2]
> >   #9 [ffff0000b4d6b710] _ocfs2_free_clusters at ffff0000019110d4
[ocfs2]
> >  #10 [ffff0000b4d6b790] ocfs2_free_clusters at ffff000001913e94
[ocfs2]
> >  #11 [ffff0000b4d6b7d0] __ocfs2_flush_truncate_log at ffff0000018b5294
[ocfs2]
> >  #12 [ffff0000b4d6b8a0] ocfs2_remove_btree_range at ffff0000018bb34c
[ocfs2]
> >  #13 [ffff0000b4d6b960] ocfs2_commit_truncate at ffff0000018bc76c
[ocfs2]
> >  #14 [ffff0000b4d6ba60] ocfs2_wipe_inode at ffff0000018e57bc [ocfs2]
> >  #15 [ffff0000b4d6bb00] ocfs2_evict_inode at ffff0000018e5db8 [ocfs2]
> >  #16 [ffff0000b4d6bb70] evict at ffff000080365040
> >  #17 [ffff0000b4d6bba0] iput at ffff0000803655d8
> >  #18 [ffff0000b4d6bbe0] ocfs2_dentry_iput at ffff0000018c60a0 [ocfs2]
> >  #19 [ffff0000b4d6bc30] dentry_unlink_inode at ffff00008035ef58
> >  #20 [ffff0000b4d6bc50] __dentry_kill at ffff000080360384
> >  #21 [ffff0000b4d6bc80] dentry_kill at ffff000080360670
> >  #22 [ffff0000b4d6bcb0] dput at ffff00008036093c
> >  #23 [ffff0000b4d6bcf0] __fput at ffff000080343930
> >  #24 [ffff0000b4d6bd40] ____fput at ffff000080343aac
> >  #25 [ffff0000b4d6bd60] task_work_run at ffff0000801172fc
> > 
> > The direct panic reason is that bh2jh (group_bh)-> b_committed_data
is null.
> > It is presumed that the network was disconnected during the write
process,
> > causing the transaction abort. as follows:
> > jbd2_journal_abort
> >   .......
> >   jbd2_journal_commit_transaction
> >     jh->b_committed_data = NULL;
> > 
> > _ocfs2_free_suballoc_bits
> >   ocfs2_block_group_clear_bits
> >     // undo_bg is now set to null
> >     BUG_ON(!undo_bg);
> > 
> > When applying for free space, if b_committed_data is null,
> > it will be directly occupied, as follows:
> > ocfs2_cluster_group_search
> >   ocfs2_block_group_find_clear_bits
> >     ocfs2_test_bg_bit_allocatable:
> >       bg = (struct ocfs2_group_desc *)
bh2jh(bg_bh)->b_committed_data;
> >       if (bg)
> >         ret = !ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap);
> >       else
> >         ret = 1;
> > b_committed_data is an intermediate state backup for bitmap
transaction commits,
> > newly applied space can overwrite previous dirty data,
> > so, I think, while free clusters, if b_committed_data is null, ignore
it.
> > Host panic directly, too violent.
(top-posting repaired)
> ping ?
> cc Andrew Morton
There's something dreadfully wrong with the ocfs2-devel mailing list. 
I almost never receive patches when people add me to cc.  I've never
seen much of a pattern to it - it just drops stuff everywhere.

Who is the admin for this list?  Can we please move it to kernel.org?

Regarding this patch, I did actually receive one email from the
server.>From Joseph:
: > +		if (!undo_bg)
: > +			mlog(ML_NOTICE, "%s: group descriptor # %llu (device %s) journal
"
: > +					"b_committed_data had been cleared.\n",
: > +					OCFS2_SB(alloc_inode->i_sb)->uuid_str,
: > +					(unsigned long long)le64_to_cpu(bg->bg_blkno),
: > +					alloc_inode->i_sb->s_id);
: 
: Seems a kind of workaround.
: I am worrying about other abnormal cases of NULL b_committed_data, it
: may lead to a corrupt filesystem.
: So how about isolating the journal abort case?

Ocfs2 devel - Mar 2020 - [PATCH] ocfs2: fix a null pointer derefrence in ocfs2_block_group_clear_bits()

[Ocfs2-devel] [PATCH] ocfs2: fix a null pointer derefrence in ocfs2_block_group_clear_bits()

[Ocfs2-devel] [PATCH] ocfs2: fix a null pointer derefrence in ocfs2_block_group_clear_bits()

[Ocfs2-devel] [PATCH] ocfs2: fix a null pointer derefrence in ocfs2_block_group_clear_bits()

[Ocfs2-devel] [PATCH] ocfs2: fix a null pointer derefrence in ocfs2_block_group_clear_bits()