On 2/26/23 11:22 PM, Heming Zhao wrote:> On Sun, Feb 26, 2023 at 04:45:44PM +0800, Heming Zhao via Ocfs2-devel wrote: >> On Fri, Feb 24, 2023 at 06:59:47PM +0800, Joseph Qi wrote: >>> >>> >>> On 2/24/23 4:03 PM, Heming Zhao wrote: >>>> On 2/24/23 3:52 PM, Joseph Qi wrote: >>>>> >>>>> >>>>> On 2/24/23 3:48 PM, Heming Zhao via Ocfs2-devel wrote: >>>>>> On 2/24/23 2:54 PM, Joseph Qi wrote: >>>>>>> I can reproduce this in my local VM. >>>>>>> I've traced ocfs2_dismount_volume and found that it hasn't been called. >>>>>>> So EBUSY is returned in VFS layer. I guess something wrong when doing >>>>>>> a copy with linked SQEs (normal copy seems no problem). >>>>>>> >>>>>> >>>>>> I am inclined to agree with you. I also test liburing examples apps >>>>>> on ext4 partition, everything looks fine. >>>>>> >>>>>> I used below bpftrace method, the retval is '3'. >>>>>> ??bpftrace -e 'kr:mnt_get_count{printf("%d\n", retval);}' >>>>>> >>>>>> It responds to flow: path_umount() => do_umount => mnt_get_count (gets '3') >>>>>> >>>>> Yes, that's the place return EBUSY. >>>>> So the problem seems to be getmnt/putmnt not match in this case. >>>>> >>>> >>>> I didn't familiar with setting up kernel bi-search env. I used one last year >>>> openSUSE tumblweed (with kernel 5.16.2), this umount issue doesn't exist. >>>> So there is a possibility one ocfs2 commit introduced this issue. >>> You can checkout each mailine version like Linux 6.0, 6.1, ... and try >>> to check if it can be reproduced. >> >> I drop this method, my machine cpu is old, compiling kernel will take too much >> time. >> >>> >>> I've tried trace mntget/mntput using the following bpftrace script, >>> link-cp output shows it misses a fput. >>> >>> #include <linux/mount.h> >>> #include <linux/string.h> >>> >>> kprobe:mntget >>> { >>> $n = ((struct vfsmount *)arg0)->mnt_sb->s_type->name; >>> >>> if (!strncmp(str($n), "ocfs2", 5)) { >>> @[comm] += 1; >>> >>> printf("%s", kstack); >>> } >>> } >>> >>> kprobe:mntput >>> { >>> $n = ((struct vfsmount *)arg0)->mnt_sb->s_type->name; >>> >>> if (!strncmp(str($n), "ocfs2", 5)) { >>> @[comm] +=1; >>> >>> printf("%s", kstack); >>> } >>> } >>> >> >> I used below script to find source file hold f_count. >> >> e.g: ./link-cp /mnt/aa.bin /mnt/1 >> The script will show f_count of aa.bin only down to 3. We expect f_count value >> down to 1 then trigger mntput. >> >> ``` >> #include <linux/mount.h> >> #include <linux/string.h> >> #include <linux/dcache.h> >> #include <linux/path.h> >> #include <linux/fs_struct.h> >> #include <linux/fs.h> >> #include <linux/io_uring_types.h> >> >> k:fput >> { >> $f = (struct file *)arg0; >> $cnt = $f->f_count; >> >> if (!strncmp(comm, "link-cp", 7)) { >> printf("\n [link-cp] fput() name:%s f_count:%lld <%p> mnt:%p\n", >> str($f->f_path.dentry->d_name.name), $cnt.counter, kptr($f), kptr($f->f_path.mnt)); >> //printf("%s\n", kstack); >> } >> } >> ``` >> > > I am not familiar with io_uring, and can't make sure my analysis is correct. > Because ocfs2_file_read_iter() returns -EOPNOTSUPP, then fails > io_iter_do_read(), the io_issue_sqe missing fput() which causes umount failure > issue. > > io_issue_sqe > + io_assign_file //call fget(), but missing fput() > + def->issue() > | io_read > | + io_iter_do_read > | | ocfs2_file_read_iter > | | return: -EOPNOTSUPP //iocb->ki_flags:0x8 > | + kiocb_done > | + io_rw_done(&rw->kiocb, ret) > | | kiocb->ki_complete(kiocb, ret) > | | io_complete_rw > | | __io_complete_rw_common > | | + req->flags |= REQ_F_REISSUE | REQ_F_PARTIAL_IO; > | | + return true > | + io_req_task_queue_reissue/io_req_task_queue_fail > | + return IOU_ISSUE_SKIP_COMPLETE; > | > + return 0; >Seems not the case. Reissue will queue work to iowq, and fput will finally get called in io_wq_free_work() once work is handled. Thanks, Joseph
On 2/27/23 10:59 AM, Joseph Qi wrote:> > > On 2/26/23 11:22 PM, Heming Zhao wrote: >> On Sun, Feb 26, 2023 at 04:45:44PM +0800, Heming Zhao via Ocfs2-devel wrote: >>> On Fri, Feb 24, 2023 at 06:59:47PM +0800, Joseph Qi wrote: >>>> >>>> >>>> On 2/24/23 4:03 PM, Heming Zhao wrote: >>>>> On 2/24/23 3:52 PM, Joseph Qi wrote: >>>>>> >>>>>> >>>>>> On 2/24/23 3:48 PM, Heming Zhao via Ocfs2-devel wrote: >>>>>>> On 2/24/23 2:54 PM, Joseph Qi wrote: >>>>>>>> I can reproduce this in my local VM. >>>>>>>> I've traced ocfs2_dismount_volume and found that it hasn't been called. >>>>>>>> So EBUSY is returned in VFS layer. I guess something wrong when doing >>>>>>>> a copy with linked SQEs (normal copy seems no problem). >>>>>>>> >>>>>>> >>>>>>> I am inclined to agree with you. I also test liburing examples apps >>>>>>> on ext4 partition, everything looks fine. >>>>>>> >>>>>>> I used below bpftrace method, the retval is '3'. >>>>>>> ??bpftrace -e 'kr:mnt_get_count{printf("%d\n", retval);}' >>>>>>> >>>>>>> It responds to flow: path_umount() => do_umount => mnt_get_count (gets '3') >>>>>>> >>>>>> Yes, that's the place return EBUSY. >>>>>> So the problem seems to be getmnt/putmnt not match in this case. >>>>>> >>>>> >>>>> I didn't familiar with setting up kernel bi-search env. I used one last year >>>>> openSUSE tumblweed (with kernel 5.16.2), this umount issue doesn't exist. >>>>> So there is a possibility one ocfs2 commit introduced this issue. >>>> You can checkout each mailine version like Linux 6.0, 6.1, ... and try >>>> to check if it can be reproduced. >>> >>> I drop this method, my machine cpu is old, compiling kernel will take too much >>> time. >>> >>>> >>>> I've tried trace mntget/mntput using the following bpftrace script, >>>> link-cp output shows it misses a fput. >>>> >>>> #include <linux/mount.h> >>>> #include <linux/string.h> >>>> >>>> kprobe:mntget >>>> { >>>> $n = ((struct vfsmount *)arg0)->mnt_sb->s_type->name; >>>> >>>> if (!strncmp(str($n), "ocfs2", 5)) { >>>> @[comm] += 1; >>>> >>>> printf("%s", kstack); >>>> } >>>> } >>>> >>>> kprobe:mntput >>>> { >>>> $n = ((struct vfsmount *)arg0)->mnt_sb->s_type->name; >>>> >>>> if (!strncmp(str($n), "ocfs2", 5)) { >>>> @[comm] +=1; >>>> >>>> printf("%s", kstack); >>>> } >>>> } >>>> >>> >>> I used below script to find source file hold f_count. >>> >>> e.g: ./link-cp /mnt/aa.bin /mnt/1 >>> The script will show f_count of aa.bin only down to 3. We expect f_count value >>> down to 1 then trigger mntput. >>> >>> ``` >>> #include <linux/mount.h> >>> #include <linux/string.h> >>> #include <linux/dcache.h> >>> #include <linux/path.h> >>> #include <linux/fs_struct.h> >>> #include <linux/fs.h> >>> #include <linux/io_uring_types.h> >>> >>> k:fput >>> { >>> $f = (struct file *)arg0; >>> $cnt = $f->f_count; >>> >>> if (!strncmp(comm, "link-cp", 7)) { >>> printf("\n [link-cp] fput() name:%s f_count:%lld <%p> mnt:%p\n", >>> str($f->f_path.dentry->d_name.name), $cnt.counter, kptr($f), kptr($f->f_path.mnt)); >>> //printf("%s\n", kstack); >>> } >>> } >>> ``` >>> >> >> I am not familiar with io_uring, and can't make sure my analysis is correct. >> Because ocfs2_file_read_iter() returns -EOPNOTSUPP, then fails >> io_iter_do_read(), the io_issue_sqe missing fput() which causes umount failure >> issue. >> >> io_issue_sqe >> + io_assign_file //call fget(), but missing fput() >> + def->issue() >> | io_read >> | + io_iter_do_read >> | | ocfs2_file_read_iter >> | | return: -EOPNOTSUPP //iocb->ki_flags:0x8 >> | + kiocb_done >> | + io_rw_done(&rw->kiocb, ret) >> | | kiocb->ki_complete(kiocb, ret) >> | | io_complete_rw >> | | __io_complete_rw_common >> | | + req->flags |= REQ_F_REISSUE | REQ_F_PARTIAL_IO; >> | | + return true >> | + io_req_task_queue_reissue/io_req_task_queue_fail >> | + return IOU_ISSUE_SKIP_COMPLETE; >> | >> + return 0; >> > Seems not the case. > Reissue will queue work to iowq, and fput will finally get called in > io_wq_free_work() once work is handled. > I think I've found the root cause, will send a patch later.Thanks, Joseph
Hi, On 2/27/23 10:59 AM, Joseph Qi via Ocfs2-devel wrote:> > > On 2/26/23 11:22 PM, Heming Zhao wrote: >> On Sun, Feb 26, 2023 at 04:45:44PM +0800, Heming Zhao via Ocfs2-devel wrote: >>> On Fri, Feb 24, 2023 at 06:59:47PM +0800, Joseph Qi wrote: >>>> >>>> >>>> On 2/24/23 4:03 PM, Heming Zhao wrote: >>>>> On 2/24/23 3:52 PM, Joseph Qi wrote: >>>>>> >>>>>> >>>>>> On 2/24/23 3:48 PM, Heming Zhao via Ocfs2-devel wrote: >>>>>>> On 2/24/23 2:54 PM, Joseph Qi wrote: >>>>>>>> I can reproduce this in my local VM. >>>>>>>> I've traced ocfs2_dismount_volume and found that it hasn't been called. >>>>>>>> So EBUSY is returned in VFS layer. I guess something wrong when doing >>>>>>>> a copy with linked SQEs (normal copy seems no problem). >>>>>>>> >>>>>>> >>>>>>> I am inclined to agree with you. I also test liburing examples apps >>>>>>> on ext4 partition, everything looks fine. >>>>>>> >>>>>>> I used below bpftrace method, the retval is '3'. >>>>>>> ??bpftrace -e 'kr:mnt_get_count{printf("%d\n", retval);}' >>>>>>> >>>>>>> It responds to flow: path_umount() => do_umount => mnt_get_count (gets '3') >>>>>>> >>>>>> Yes, that's the place return EBUSY. >>>>>> So the problem seems to be getmnt/putmnt not match in this case. >>>>>> >>>>> >>>>> I didn't familiar with setting up kernel bi-search env. I used one last year >>>>> openSUSE tumblweed (with kernel 5.16.2), this umount issue doesn't exist. >>>>> So there is a possibility one ocfs2 commit introduced this issue. >>>> You can checkout each mailine version like Linux 6.0, 6.1, ... and try >>>> to check if it can be reproduced. >>> >>> I drop this method, my machine cpu is old, compiling kernel will take too much >>> time. >>> >>>> >>>> I've tried trace mntget/mntput using the following bpftrace script, >>>> link-cp output shows it misses a fput. >>>> >>>> #include <linux/mount.h> >>>> #include <linux/string.h> >>>> >>>> kprobe:mntget >>>> { >>>> $n = ((struct vfsmount *)arg0)->mnt_sb->s_type->name; >>>> >>>> if (!strncmp(str($n), "ocfs2", 5)) { >>>> @[comm] += 1; >>>> >>>> printf("%s", kstack); >>>> } >>>> } >>>> >>>> kprobe:mntput >>>> { >>>> $n = ((struct vfsmount *)arg0)->mnt_sb->s_type->name; >>>> >>>> if (!strncmp(str($n), "ocfs2", 5)) { >>>> @[comm] +=1; >>>> >>>> printf("%s", kstack); >>>> } >>>> } >>>> >>> >>> I used below script to find source file hold f_count. >>> >>> e.g: ./link-cp /mnt/aa.bin /mnt/1 >>> The script will show f_count of aa.bin only down to 3. We expect f_count value >>> down to 1 then trigger mntput. >>> >>> ``` >>> #include <linux/mount.h> >>> #include <linux/string.h> >>> #include <linux/dcache.h> >>> #include <linux/path.h> >>> #include <linux/fs_struct.h> >>> #include <linux/fs.h> >>> #include <linux/io_uring_types.h> >>> >>> k:fput >>> { >>> $f = (struct file *)arg0; >>> $cnt = $f->f_count; >>> >>> if (!strncmp(comm, "link-cp", 7)) { >>> printf("\n [link-cp] fput() name:%s f_count:%lld <%p> mnt:%p\n", >>> str($f->f_path.dentry->d_name.name), $cnt.counter, kptr($f), kptr($f->f_path.mnt)); >>> //printf("%s\n", kstack); >>> } >>> } >>> ``` >>> >> >> I am not familiar with io_uring, and can't make sure my analysis is correct. >> Because ocfs2_file_read_iter() returns -EOPNOTSUPP, then fails >> io_iter_do_read(), the io_issue_sqe missing fput() which causes umount failure >> issue. >> >> io_issue_sqe >> + io_assign_file //call fget(), but missing fput() >> + def->issue() >> | io_read >> | + io_iter_do_read >> | | ocfs2_file_read_iter >> | | return: -EOPNOTSUPP //iocb->ki_flags:0x8 >> | + kiocb_done >> | + io_rw_done(&rw->kiocb, ret) >> | | kiocb->ki_complete(kiocb, ret) >> | | io_complete_rw >> | | __io_complete_rw_common >> | | + req->flags |= REQ_F_REISSUE | REQ_F_PARTIAL_IO; >> | | + return true >> | + io_req_task_queue_reissue/io_req_task_queue_fail >> | + return IOU_ISSUE_SKIP_COMPLETE; >> | >> + return 0; >> > Seems not the case. > Reissue will queue work to iowq, and fput will finally get called in > io_wq_free_work() once work is handled. >I've sent a fix to io_uring, please refer: https://lore.kernel.org/io-uring/167758917178.12826.13481934362166793957.b4-ty at kernel.dk/T/#t Thanks, Joseph