Joseph Qi
2023-Feb-15 06:42 UTC
[Ocfs2-devel] discuss about jbd2 assertion in defragment path
On 2/15/23 2:20 PM, Heming Zhao wrote:> On 2/15/23 10:06 AM, Joseph Qi wrote: >> >> >> On 2/14/23 7:48 PM, Heming Zhao wrote: >>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: >>>> Hi, >>>> >>>> On 2/14/23 12:33 PM, Heming Zhao wrote: >>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: >>>>>> Hi, Sorry about the late reply. >>>>>> This thread is indeed a long time ago:( >>>>>> It seems that I said the two ocfs2_journal_access_di() are for different >>>>>> buffer head. Anyway, I have to recall the discussion before and get back >>>>>> to you. >>>>>> >>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch >>>>> makes sense from theory. >>>>> >>>>> ?From code logic, if the defrag path (ocfs2_split_extent) doesn't call >>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits >>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash, >>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my >>>>> view, the correct fix should delete the access/dirty pair and make sure the >>>>> ocfs2_split_extent() is journal self-service. >>>>> >>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(), >>>>> make sure all journal handle branches are correct. I already checked >>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. >>>> >>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree, >>>> that's why we have to do a transaction here. >>> >>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() >>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling >>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read >>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) >>> >>> __ocfs2_move_extent >>> ? + ocfs2_journal_access_di //[a]?? <------+ >>> ? + ocfs2_split_extent????? //[b]????????? | [a, b.1) area is read operations >>> ? |? + if?????????????????? //[b.1] <------+ >>> ? |? |? A >>> ? |? |? +---- from this line, this func starts r/w IOs & needs jbd2 operations >>> ? |? | >>> ? |? |?? ocfs2_replace_extent_rec()/ocfs2_split_and_insert() >>> ? |? + else >>> ? |????? ocfs2_try_to_merge_extent () >>> ? | >>> ? + ocfs2_journal_dirty???? //[c] >>> >>>> >>>> The whole code flow: >>>> ocfs2_ioctl_move_extents >>>> ?? ocfs2_move_extents >>>> ???? __ocfs2_move_extents_range >>>> ?????? ocfs2_defrag_extent? [1] >>>> ???????? ocfs2_start_trans? [a] >>>> ???????? __ocfs2_move_extent >>>> ?????????? ocfs2_journal_access_di >>>> ?????????? ocfs2_split_extent >>>> ?????????? ocfs2_journal_dirty >>>> ???????? ocfs2_commit_trans >>>> ?????? ocfs2_move_extent? [2] >>>> ???????? ocfs2_start_trans >>>> ???????? __ocfs2_move_extent >>>> ?????????? ocfs2_journal_access_di >>>> ?????????? ocfs2_split_extent >>>> ?????????? ocfs2_journal_dirty >>>> ???????? ocfs2_commit_trans >>>> ???? ocfs2_start_trans? [b] >>>> ???? ocfs2_journal_access_di >>>> ???? ocfs2_journal_dirty >>>> ???? ocfs2_commit_trans >>>> >>>> In above, [a] and [b] are different transaction start/commit pair, and >>>> each allocates different journal credits, even they are operate the same >>>> bh, they are still different operations. >>> >>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent() >>> is journal self-service. >> >> Don't understand what does 'journal self-service' mean. > > Sorry for my English skill, I invent this word 'journal self-service'. > The meaning is: this function could handle journal operations totally by itself. > Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2 > journal start/stop pair. > > Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function, > we even could add these two func in ocfs2_split_extent(). Then any caller won't > take care of any journal operations when calling ocfs2_split_extent(). > >> >>> It seems the author meaning: he wanted [a] to protect defragging actions (extents >>> changes), wanted [b] to protect inode ctime changes. the >>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the >>> access/dirty pair in __ocfs2_move_extent() is potential crash risk. >>> >> If so, what is transaction [a] protect? (no dirty buffer) > > Transaction [a] protects the dirty buffers in ocfs2_split_extent(). > ocfs2_split_extent() has already correctly used access/dirty pair to > mark/handle dirty buffers. >Okay, I think I've gotten your idea now. You change looks reasonable. Could you please also check if move extents without auto defrag also has the same issue? Thanks, Joseph
Heming Zhao
2023-Feb-15 07:29 UTC
[Ocfs2-devel] discuss about jbd2 assertion in defragment path
On 2/15/23 2:42 PM, Joseph Qi wrote:> > > On 2/15/23 2:20 PM, Heming Zhao wrote: >> On 2/15/23 10:06 AM, Joseph Qi wrote: >>> >>> >>> On 2/14/23 7:48 PM, Heming Zhao wrote: >>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: >>>>> Hi, >>>>> >>>>> On 2/14/23 12:33 PM, Heming Zhao wrote: >>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: >>>>>>> Hi, Sorry about the late reply. >>>>>>> This thread is indeed a long time ago:( >>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different >>>>>>> buffer head. Anyway, I have to recall the discussion before and get back >>>>>>> to you. >>>>>>> >>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch >>>>>> makes sense from theory. >>>>>> >>>>>> ?From code logic, if the defrag path (ocfs2_split_extent) doesn't call >>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits >>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash, >>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my >>>>>> view, the correct fix should delete the access/dirty pair and make sure the >>>>>> ocfs2_split_extent() is journal self-service. >>>>>> >>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(), >>>>>> make sure all journal handle branches are correct. I already checked >>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. >>>>> >>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree, >>>>> that's why we have to do a transaction here. >>>> >>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() >>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling >>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read >>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) >>>> >>>> __ocfs2_move_extent >>>> ? + ocfs2_journal_access_di //[a]?? <------+ >>>> ? + ocfs2_split_extent????? //[b]????????? | [a, b.1) area is read operations >>>> ? |? + if?????????????????? //[b.1] <------+ >>>> ? |? |? A >>>> ? |? |? +---- from this line, this func starts r/w IOs & needs jbd2 operations >>>> ? |? | >>>> ? |? |?? ocfs2_replace_extent_rec()/ocfs2_split_and_insert() >>>> ? |? + else >>>> ? |????? ocfs2_try_to_merge_extent () >>>> ? | >>>> ? + ocfs2_journal_dirty???? //[c] >>>> >>>>> >>>>> The whole code flow: >>>>> ocfs2_ioctl_move_extents >>>>> ?? ocfs2_move_extents >>>>> ???? __ocfs2_move_extents_range >>>>> ?????? ocfs2_defrag_extent? [1] >>>>> ???????? ocfs2_start_trans? [a] >>>>> ???????? __ocfs2_move_extent >>>>> ?????????? ocfs2_journal_access_di >>>>> ?????????? ocfs2_split_extent >>>>> ?????????? ocfs2_journal_dirty >>>>> ???????? ocfs2_commit_trans >>>>> ?????? ocfs2_move_extent? [2] >>>>> ???????? ocfs2_start_trans >>>>> ???????? __ocfs2_move_extent >>>>> ?????????? ocfs2_journal_access_di >>>>> ?????????? ocfs2_split_extent >>>>> ?????????? ocfs2_journal_dirty >>>>> ???????? ocfs2_commit_trans >>>>> ???? ocfs2_start_trans? [b] >>>>> ???? ocfs2_journal_access_di >>>>> ???? ocfs2_journal_dirty >>>>> ???? ocfs2_commit_trans >>>>> >>>>> In above, [a] and [b] are different transaction start/commit pair, and >>>>> each allocates different journal credits, even they are operate the same >>>>> bh, they are still different operations. >>>> >>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent() >>>> is journal self-service. >>> >>> Don't understand what does 'journal self-service' mean. >> >> Sorry for my English skill, I invent this word 'journal self-service'. >> The meaning is: this function could handle journal operations totally by itself. >> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2 >> journal start/stop pair. >> >> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function, >> we even could add these two func in ocfs2_split_extent(). Then any caller won't >> take care of any journal operations when calling ocfs2_split_extent(). >> >>> >>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents >>>> changes), wanted [b] to protect inode ctime changes. the >>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the >>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk. >>>> >>> If so, what is transaction [a] protect? (no dirty buffer) >> >> Transaction [a] protects the dirty buffers in ocfs2_split_extent(). >> ocfs2_split_extent() has already correctly used access/dirty pair to >> mark/handle dirty buffers. >> > Okay, I think I've gotten your idea now. You change looks reasonable. > Could you please also check if move extents without auto defrag also has > the same issue?OK, wait me for checking. Thanks, Heming> > Thanks, > Joseph >
Heming Zhao
2023-Feb-15 10:02 UTC
[Ocfs2-devel] discuss about jbd2 assertion in defragment path
On 2/15/23 2:42 PM, Joseph Qi wrote:> > > On 2/15/23 2:20 PM, Heming Zhao wrote: >> On 2/15/23 10:06 AM, Joseph Qi wrote: >>> >>> >>> On 2/14/23 7:48 PM, Heming Zhao wrote: >>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: >>>>> Hi, >>>>> >>>>> On 2/14/23 12:33 PM, Heming Zhao wrote: >>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: >>>>>>> Hi, Sorry about the late reply. >>>>>>> This thread is indeed a long time ago:( >>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different >>>>>>> buffer head. Anyway, I have to recall the discussion before and get back >>>>>>> to you. >>>>>>> >>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch >>>>>> makes sense from theory. >>>>>> >>>>>> ?From code logic, if the defrag path (ocfs2_split_extent) doesn't call >>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits >>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash, >>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my >>>>>> view, the correct fix should delete the access/dirty pair and make sure the >>>>>> ocfs2_split_extent() is journal self-service. >>>>>> >>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(), >>>>>> make sure all journal handle branches are correct. I already checked >>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. >>>>> >>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree, >>>>> that's why we have to do a transaction here. >>>> >>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() >>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling >>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read >>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) >>>> >>>> __ocfs2_move_extent >>>> ? + ocfs2_journal_access_di //[a]?? <------+ >>>> ? + ocfs2_split_extent????? //[b]????????? | [a, b.1) area is read operations >>>> ? |? + if?????????????????? //[b.1] <------+ >>>> ? |? |? A >>>> ? |? |? +---- from this line, this func starts r/w IOs & needs jbd2 operations >>>> ? |? | >>>> ? |? |?? ocfs2_replace_extent_rec()/ocfs2_split_and_insert() >>>> ? |? + else >>>> ? |????? ocfs2_try_to_merge_extent () >>>> ? | >>>> ? + ocfs2_journal_dirty???? //[c] >>>> >>>>> >>>>> The whole code flow: >>>>> ocfs2_ioctl_move_extents >>>>> ?? ocfs2_move_extents >>>>> ???? __ocfs2_move_extents_range >>>>> ?????? ocfs2_defrag_extent? [1] >>>>> ???????? ocfs2_start_trans? [a] >>>>> ???????? __ocfs2_move_extent >>>>> ?????????? ocfs2_journal_access_di >>>>> ?????????? ocfs2_split_extent >>>>> ?????????? ocfs2_journal_dirty >>>>> ???????? ocfs2_commit_trans >>>>> ?????? ocfs2_move_extent? [2] >>>>> ???????? ocfs2_start_trans >>>>> ???????? __ocfs2_move_extent >>>>> ?????????? ocfs2_journal_access_di >>>>> ?????????? ocfs2_split_extent >>>>> ?????????? ocfs2_journal_dirty >>>>> ???????? ocfs2_commit_trans >>>>> ???? ocfs2_start_trans? [b] >>>>> ???? ocfs2_journal_access_di >>>>> ???? ocfs2_journal_dirty >>>>> ???? ocfs2_commit_trans >>>>> >>>>> In above, [a] and [b] are different transaction start/commit pair, and >>>>> each allocates different journal credits, even they are operate the same >>>>> bh, they are still different operations. >>>> >>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent() >>>> is journal self-service. >>> >>> Don't understand what does 'journal self-service' mean. >> >> Sorry for my English skill, I invent this word 'journal self-service'. >> The meaning is: this function could handle journal operations totally by itself. >> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2 >> journal start/stop pair. >> >> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function, >> we even could add these two func in ocfs2_split_extent(). Then any caller won't >> take care of any journal operations when calling ocfs2_split_extent(). >> >>> >>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents >>>> changes), wanted [b] to protect inode ctime changes. the >>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the >>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk. >>>> >>> If so, what is transaction [a] protect? (no dirty buffer) >> >> Transaction [a] protects the dirty buffers in ocfs2_split_extent(). >> ocfs2_split_extent() has already correctly used access/dirty pair to >> mark/handle dirty buffers. >> > Okay, I think I've gotten your idea now. You change looks reasonable. > Could you please also check if move extents without auto defrag also has > the same issue? >I have finished code checking. ocfs2_move_extent() path has the same issue. My patch also works fine from this code branch. In theory, ocfs2_move_extent() has less opportunity to trigger crash, because userspace defrag.ocfs2 very likely set less than file size on 'struct ocfs2_move_extents : me_len', if defrag.ocfs2 would support this code branch. If you think my patch is acceptable, I will file v2 patch & plan only to change commit log in v2. Thanks, Heming