Heming Zhao
2023-Feb-26 08:45 UTC
[Ocfs2-devel] report BUG: io_uring triggers umount error
On Fri, Feb 24, 2023 at 06:59:47PM +0800, Joseph Qi wrote:> > > On 2/24/23 4:03 PM, Heming Zhao wrote: > > On 2/24/23 3:52 PM, Joseph Qi wrote: > >> > >> > >> On 2/24/23 3:48 PM, Heming Zhao via Ocfs2-devel wrote: > >>> On 2/24/23 2:54 PM, Joseph Qi wrote: > >>>> I can reproduce this in my local VM. > >>>> I've traced ocfs2_dismount_volume and found that it hasn't been called. > >>>> So EBUSY is returned in VFS layer. I guess something wrong when doing > >>>> a copy with linked SQEs (normal copy seems no problem). > >>>> > >>> > >>> I am inclined to agree with you. I also test liburing examples apps > >>> on ext4 partition, everything looks fine. > >>> > >>> I used below bpftrace method, the retval is '3'. > >>> ??bpftrace -e 'kr:mnt_get_count{printf("%d\n", retval);}' > >>> > >>> It responds to flow: path_umount() => do_umount => mnt_get_count (gets '3') > >>> > >> Yes, that's the place return EBUSY. > >> So the problem seems to be getmnt/putmnt not match in this case. > >> > > > > I didn't familiar with setting up kernel bi-search env. I used one last year > > openSUSE tumblweed (with kernel 5.16.2), this umount issue doesn't exist. > > So there is a possibility one ocfs2 commit introduced this issue. >You can checkout each mailine version like Linux 6.0, 6.1, ... and try > to check if it can be reproduced.I drop this method, my machine cpu is old, compiling kernel will take too much time.> > I've tried trace mntget/mntput using the following bpftrace script, > link-cp output shows it misses a fput. > > #include <linux/mount.h> > #include <linux/string.h> > > kprobe:mntget > { > $n = ((struct vfsmount *)arg0)->mnt_sb->s_type->name; > > if (!strncmp(str($n), "ocfs2", 5)) { > @[comm] += 1; > > printf("%s", kstack); > } > } > > kprobe:mntput > { > $n = ((struct vfsmount *)arg0)->mnt_sb->s_type->name; > > if (!strncmp(str($n), "ocfs2", 5)) { > @[comm] +=1; > > printf("%s", kstack); > } > } >I used below script to find source file hold f_count. e.g: ./link-cp /mnt/aa.bin /mnt/1 The script will show f_count of aa.bin only down to 3. We expect f_count value down to 1 then trigger mntput. ``` #include <linux/mount.h> #include <linux/string.h> #include <linux/dcache.h> #include <linux/path.h> #include <linux/fs_struct.h> #include <linux/fs.h> #include <linux/io_uring_types.h> k:fput { $f = (struct file *)arg0; $cnt = $f->f_count; if (!strncmp(comm, "link-cp", 7)) { printf("\n [link-cp] fput() name:%s f_count:%lld <%p> mnt:%p\n", str($f->f_path.dentry->d_name.name), $cnt.counter, kptr($f), kptr($f->f_path.mnt)); //printf("%s\n", kstack); } } ``` - Heming
Heming Zhao
2023-Feb-26 15:22 UTC
[Ocfs2-devel] report BUG: io_uring triggers umount error
On Sun, Feb 26, 2023 at 04:45:44PM +0800, Heming Zhao via Ocfs2-devel wrote:> On Fri, Feb 24, 2023 at 06:59:47PM +0800, Joseph Qi wrote: > > > > > > On 2/24/23 4:03 PM, Heming Zhao wrote: > > > On 2/24/23 3:52 PM, Joseph Qi wrote: > > >> > > >> > > >> On 2/24/23 3:48 PM, Heming Zhao via Ocfs2-devel wrote: > > >>> On 2/24/23 2:54 PM, Joseph Qi wrote: > > >>>> I can reproduce this in my local VM. > > >>>> I've traced ocfs2_dismount_volume and found that it hasn't been called. > > >>>> So EBUSY is returned in VFS layer. I guess something wrong when doing > > >>>> a copy with linked SQEs (normal copy seems no problem). > > >>>> > > >>> > > >>> I am inclined to agree with you. I also test liburing examples apps > > >>> on ext4 partition, everything looks fine. > > >>> > > >>> I used below bpftrace method, the retval is '3'. > > >>> ??bpftrace -e 'kr:mnt_get_count{printf("%d\n", retval);}' > > >>> > > >>> It responds to flow: path_umount() => do_umount => mnt_get_count (gets '3') > > >>> > > >> Yes, that's the place return EBUSY. > > >> So the problem seems to be getmnt/putmnt not match in this case. > > >> > > > > > > I didn't familiar with setting up kernel bi-search env. I used one last year > > > openSUSE tumblweed (with kernel 5.16.2), this umount issue doesn't exist. > > > So there is a possibility one ocfs2 commit introduced this issue. > >You can checkout each mailine version like Linux 6.0, 6.1, ... and try > > to check if it can be reproduced. > > I drop this method, my machine cpu is old, compiling kernel will take too much > time. > > > > > I've tried trace mntget/mntput using the following bpftrace script, > > link-cp output shows it misses a fput. > > > > #include <linux/mount.h> > > #include <linux/string.h> > > > > kprobe:mntget > > { > > $n = ((struct vfsmount *)arg0)->mnt_sb->s_type->name; > > > > if (!strncmp(str($n), "ocfs2", 5)) { > > @[comm] += 1; > > > > printf("%s", kstack); > > } > > } > > > > kprobe:mntput > > { > > $n = ((struct vfsmount *)arg0)->mnt_sb->s_type->name; > > > > if (!strncmp(str($n), "ocfs2", 5)) { > > @[comm] +=1; > > > > printf("%s", kstack); > > } > > } > > > > I used below script to find source file hold f_count. > > e.g: ./link-cp /mnt/aa.bin /mnt/1 > The script will show f_count of aa.bin only down to 3. We expect f_count value > down to 1 then trigger mntput. > > ``` > #include <linux/mount.h> > #include <linux/string.h> > #include <linux/dcache.h> > #include <linux/path.h> > #include <linux/fs_struct.h> > #include <linux/fs.h> > #include <linux/io_uring_types.h> > > k:fput > { > $f = (struct file *)arg0; > $cnt = $f->f_count; > > if (!strncmp(comm, "link-cp", 7)) { > printf("\n [link-cp] fput() name:%s f_count:%lld <%p> mnt:%p\n", > str($f->f_path.dentry->d_name.name), $cnt.counter, kptr($f), kptr($f->f_path.mnt)); > //printf("%s\n", kstack); > } > } > ``` >I am not familiar with io_uring, and can't make sure my analysis is correct. Because ocfs2_file_read_iter() returns -EOPNOTSUPP, then fails io_iter_do_read(), the io_issue_sqe missing fput() which causes umount failure issue. io_issue_sqe + io_assign_file //call fget(), but missing fput() + def->issue() | io_read | + io_iter_do_read | | ocfs2_file_read_iter | | return: -EOPNOTSUPP //iocb->ki_flags:0x8 | + kiocb_done | + io_rw_done(&rw->kiocb, ret) | | kiocb->ki_complete(kiocb, ret) | | io_complete_rw | | __io_complete_rw_common | | + req->flags |= REQ_F_REISSUE | REQ_F_PARTIAL_IO; | | + return true | + io_req_task_queue_reissue/io_req_task_queue_fail | + return IOU_ISSUE_SKIP_COMPLETE; | + return 0; - Heming