Heming Zhao
2023-Feb-15 10:02 UTC
[Ocfs2-devel] discuss about jbd2 assertion in defragment path
On 2/15/23 2:42 PM, Joseph Qi wrote:> > > On 2/15/23 2:20 PM, Heming Zhao wrote: >> On 2/15/23 10:06 AM, Joseph Qi wrote: >>> >>> >>> On 2/14/23 7:48 PM, Heming Zhao wrote: >>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: >>>>> Hi, >>>>> >>>>> On 2/14/23 12:33 PM, Heming Zhao wrote: >>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: >>>>>>> Hi, Sorry about the late reply. >>>>>>> This thread is indeed a long time ago:( >>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different >>>>>>> buffer head. Anyway, I have to recall the discussion before and get back >>>>>>> to you. >>>>>>> >>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch >>>>>> makes sense from theory. >>>>>> >>>>>> ?From code logic, if the defrag path (ocfs2_split_extent) doesn't call >>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits >>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash, >>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my >>>>>> view, the correct fix should delete the access/dirty pair and make sure the >>>>>> ocfs2_split_extent() is journal self-service. >>>>>> >>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(), >>>>>> make sure all journal handle branches are correct. I already checked >>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. >>>>> >>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree, >>>>> that's why we have to do a transaction here. >>>> >>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() >>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling >>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read >>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) >>>> >>>> __ocfs2_move_extent >>>> ? + ocfs2_journal_access_di //[a]?? <------+ >>>> ? + ocfs2_split_extent????? //[b]????????? | [a, b.1) area is read operations >>>> ? |? + if?????????????????? //[b.1] <------+ >>>> ? |? |? A >>>> ? |? |? +---- from this line, this func starts r/w IOs & needs jbd2 operations >>>> ? |? | >>>> ? |? |?? ocfs2_replace_extent_rec()/ocfs2_split_and_insert() >>>> ? |? + else >>>> ? |????? ocfs2_try_to_merge_extent () >>>> ? | >>>> ? + ocfs2_journal_dirty???? //[c] >>>> >>>>> >>>>> The whole code flow: >>>>> ocfs2_ioctl_move_extents >>>>> ?? ocfs2_move_extents >>>>> ???? __ocfs2_move_extents_range >>>>> ?????? ocfs2_defrag_extent? [1] >>>>> ???????? ocfs2_start_trans? [a] >>>>> ???????? __ocfs2_move_extent >>>>> ?????????? ocfs2_journal_access_di >>>>> ?????????? ocfs2_split_extent >>>>> ?????????? ocfs2_journal_dirty >>>>> ???????? ocfs2_commit_trans >>>>> ?????? ocfs2_move_extent? [2] >>>>> ???????? ocfs2_start_trans >>>>> ???????? __ocfs2_move_extent >>>>> ?????????? ocfs2_journal_access_di >>>>> ?????????? ocfs2_split_extent >>>>> ?????????? ocfs2_journal_dirty >>>>> ???????? ocfs2_commit_trans >>>>> ???? ocfs2_start_trans? [b] >>>>> ???? ocfs2_journal_access_di >>>>> ???? ocfs2_journal_dirty >>>>> ???? ocfs2_commit_trans >>>>> >>>>> In above, [a] and [b] are different transaction start/commit pair, and >>>>> each allocates different journal credits, even they are operate the same >>>>> bh, they are still different operations. >>>> >>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent() >>>> is journal self-service. >>> >>> Don't understand what does 'journal self-service' mean. >> >> Sorry for my English skill, I invent this word 'journal self-service'. >> The meaning is: this function could handle journal operations totally by itself. >> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2 >> journal start/stop pair. >> >> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function, >> we even could add these two func in ocfs2_split_extent(). Then any caller won't >> take care of any journal operations when calling ocfs2_split_extent(). >> >>> >>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents >>>> changes), wanted [b] to protect inode ctime changes. the >>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the >>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk. >>>> >>> If so, what is transaction [a] protect? (no dirty buffer) >> >> Transaction [a] protects the dirty buffers in ocfs2_split_extent(). >> ocfs2_split_extent() has already correctly used access/dirty pair to >> mark/handle dirty buffers. >> > Okay, I think I've gotten your idea now. You change looks reasonable. > Could you please also check if move extents without auto defrag also has > the same issue? >I have finished code checking. ocfs2_move_extent() path has the same issue. My patch also works fine from this code branch. In theory, ocfs2_move_extent() has less opportunity to trigger crash, because userspace defrag.ocfs2 very likely set less than file size on 'struct ocfs2_move_extents : me_len', if defrag.ocfs2 would support this code branch. If you think my patch is acceptable, I will file v2 patch & plan only to change commit log in v2. Thanks, Heming
Joseph Qi
2023-Feb-15 12:04 UTC
[Ocfs2-devel] discuss about jbd2 assertion in defragment path
On 2/15/23 6:02 PM, Heming Zhao wrote:> On 2/15/23 2:42 PM, Joseph Qi wrote: >> >> >> On 2/15/23 2:20 PM, Heming Zhao wrote: >>> On 2/15/23 10:06 AM, Joseph Qi wrote: >>>> >>>> >>>> On 2/14/23 7:48 PM, Heming Zhao wrote: >>>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: >>>>>> Hi, >>>>>> >>>>>> On 2/14/23 12:33 PM, Heming Zhao wrote: >>>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: >>>>>>>> Hi, Sorry about the late reply. >>>>>>>> This thread is indeed a long time ago:( >>>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different >>>>>>>> buffer head. Anyway, I have to recall the discussion before and get back >>>>>>>> to you. >>>>>>>> >>>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch >>>>>>> makes sense from theory. >>>>>>> >>>>>>> ??From code logic, if the defrag path (ocfs2_split_extent) doesn't call >>>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits >>>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash, >>>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my >>>>>>> view, the correct fix should delete the access/dirty pair and make sure the >>>>>>> ocfs2_split_extent() is journal self-service. >>>>>>> >>>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(), >>>>>>> make sure all journal handle branches are correct. I already checked >>>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. >>>>>> >>>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree, >>>>>> that's why we have to do a transaction here. >>>>> >>>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() >>>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling >>>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read >>>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) >>>>> >>>>> __ocfs2_move_extent >>>>> ?? + ocfs2_journal_access_di //[a]?? <------+ >>>>> ?? + ocfs2_split_extent????? //[b]????????? | [a, b.1) area is read operations >>>>> ?? |? + if?????????????????? //[b.1] <------+ >>>>> ?? |? |? A >>>>> ?? |? |? +---- from this line, this func starts r/w IOs & needs jbd2 operations >>>>> ?? |? | >>>>> ?? |? |?? ocfs2_replace_extent_rec()/ocfs2_split_and_insert() >>>>> ?? |? + else >>>>> ?? |????? ocfs2_try_to_merge_extent () >>>>> ?? | >>>>> ?? + ocfs2_journal_dirty???? //[c] >>>>> >>>>>> >>>>>> The whole code flow: >>>>>> ocfs2_ioctl_move_extents >>>>>> ??? ocfs2_move_extents >>>>>> ????? __ocfs2_move_extents_range >>>>>> ??????? ocfs2_defrag_extent? [1] >>>>>> ????????? ocfs2_start_trans? [a] >>>>>> ????????? __ocfs2_move_extent >>>>>> ??????????? ocfs2_journal_access_di >>>>>> ??????????? ocfs2_split_extent >>>>>> ??????????? ocfs2_journal_dirty >>>>>> ????????? ocfs2_commit_trans >>>>>> ??????? ocfs2_move_extent? [2] >>>>>> ????????? ocfs2_start_trans >>>>>> ????????? __ocfs2_move_extent >>>>>> ??????????? ocfs2_journal_access_di >>>>>> ??????????? ocfs2_split_extent >>>>>> ??????????? ocfs2_journal_dirty >>>>>> ????????? ocfs2_commit_trans >>>>>> ????? ocfs2_start_trans? [b] >>>>>> ????? ocfs2_journal_access_di >>>>>> ????? ocfs2_journal_dirty >>>>>> ????? ocfs2_commit_trans >>>>>> >>>>>> In above, [a] and [b] are different transaction start/commit pair, and >>>>>> each allocates different journal credits, even they are operate the same >>>>>> bh, they are still different operations. >>>>> >>>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent() >>>>> is journal self-service. >>>> >>>> Don't understand what does 'journal self-service' mean. >>> >>> Sorry for my English skill, I invent this word 'journal self-service'. >>> The meaning is: this function could handle journal operations totally by itself. >>> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2 >>> journal start/stop pair. >>> >>> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function, >>> we even could add these two func in ocfs2_split_extent(). Then any caller won't >>> take care of any journal operations when calling ocfs2_split_extent(). >>> >>>> >>>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents >>>>> changes), wanted [b] to protect inode ctime changes. the >>>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the >>>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk. >>>>> >>>> If so, what is transaction [a] protect? (no dirty buffer) >>> >>> Transaction [a] protects the dirty buffers in ocfs2_split_extent(). >>> ocfs2_split_extent() has already correctly used access/dirty pair to >>> mark/handle dirty buffers. >>> >> Okay, I think I've gotten your idea now. You change looks reasonable. >> Could you please also check if move extents without auto defrag also has >> the same issue? >> > > I have finished code checking. ocfs2_move_extent() path has the same issue. > My patch also works fine from this code branch. > In theory, ocfs2_move_extent() has less opportunity to trigger crash, because > userspace defrag.ocfs2 very likely set less than file size on > 'struct ocfs2_move_extents : me_len', if defrag.ocfs2 would support this > code branch. >If we write a simple program to do move extents ioctl on the same ocfs2 (without auto defrag), can we trigger this crash?> If you think my patch is acceptable, I will file v2 patch & plan only to change > commit log in v2. >Sure, with above confirmed.