Changwei Ge
2018-Apr-11 01:07 UTC
[Ocfs2-devel] [PATCH] ocfs2: don't use iocb when EIOCBQUEUED returns
Hi Jun, On 2018/4/11 8:52, piaojun wrote:> Hi Changwei, > > It looks like a code bug, and 'iocb' should not be freed at this place. > Could this BUG reproduced easily?Actually, it's not easy to be reproduced since IO is much slower than CPU executing instructions. But the logic here is broken, we'd better fix this. Thanks, Changwei> > thanks, > Jun > > On 2018/4/10 20:00, Changwei Ge wrote: >> When -EIOCBQUEUED returns, it means that aio_complete() will be called >> from dio_complete(), which is an asynchronous progress against write_iter. >> Generally, IO is a very slow progress than executing instruction, but we >> still can't take the risk to access a freed iocb. >> >> And we do face a BUG crash issue. >> >From crash tool, iocb is obviously freed already. >> crash> struct -x kiocb ffff881a350f5900 >> struct kiocb { >> ki_filp = 0xffff881a350f5a80, >> ki_pos = 0x0, >> ki_complete = 0x0, >> private = 0x0, >> ki_flags = 0x0 >> } >> >> And the backtrace shows: >> ocfs2_file_write_iter+0xcaa/0xd00 [ocfs2] >> ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] >> aio_run_iocb+0x229/0x2f0 >> ? try_to_wake_up+0x380/0x380 >> do_io_submit+0x291/0x540 >> ? syscall_trace_leave+0xad/0x130 >> SyS_io_submit+0x10/0x20 >> system_call_fastpath+0x16/0x75 >> >> Signed-off-by: Changwei Ge <ge.changwei at h3c.com> >> --- >> fs/ocfs2/file.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >> index 5d1784a..1393ff2 100644 >> --- a/fs/ocfs2/file.c >> +++ b/fs/ocfs2/file.c >> @@ -2343,7 +2343,7 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb, >> >> written = __generic_file_write_iter(iocb, from); >> /* buffered aio wouldn't have proper lock coverage today */ >> - BUG_ON(written == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT)); >> + BUG_ON(written == -EIOCBQUEUED && !direct_io); >> >> /* >> * deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io >> @@ -2463,7 +2463,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb, >> trace_generic_file_aio_read_ret(ret); >> >> /* buffered aio wouldn't have proper lock coverage today */ >> - BUG_ON(ret == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT)); >> + BUG_ON(ret == -EIOCBQUEUED && !direct_io); >> >> /* see ocfs2_file_write_iter */ >> if (ret == -EIOCBQUEUED || !ocfs2_iocb_is_rw_locked(iocb)) { >> >
piaojun
2018-Apr-11 01:51 UTC
[Ocfs2-devel] [PATCH] ocfs2: don't use iocb when EIOCBQUEUED returns
Hi Changwei, It seems other codes which try to access 'iocb' will also cause error, right? I think we should find the reason why 'iocb' is freed first. thanks, Jun On 2018/4/11 9:07, Changwei Ge wrote:> Hi Jun, > > On 2018/4/11 8:52, piaojun wrote: >> Hi Changwei, >> >> It looks like a code bug, and 'iocb' should not be freed at this place. >> Could this BUG reproduced easily? > > Actually, it's not easy to be reproduced since IO is much slower than CPU > executing instructions. But the logic here is broken, we'd better fix this. > > Thanks, > Changwei > >> >> thanks, >> Jun >> >> On 2018/4/10 20:00, Changwei Ge wrote: >>> When -EIOCBQUEUED returns, it means that aio_complete() will be called >>> from dio_complete(), which is an asynchronous progress against write_iter. >>> Generally, IO is a very slow progress than executing instruction, but we >>> still can't take the risk to access a freed iocb. >>> >>> And we do face a BUG crash issue. >>> >From crash tool, iocb is obviously freed already. >>> crash> struct -x kiocb ffff881a350f5900 >>> struct kiocb { >>> ki_filp = 0xffff881a350f5a80, >>> ki_pos = 0x0, >>> ki_complete = 0x0, >>> private = 0x0, >>> ki_flags = 0x0 >>> } >>> >>> And the backtrace shows: >>> ocfs2_file_write_iter+0xcaa/0xd00 [ocfs2] >>> ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] >>> aio_run_iocb+0x229/0x2f0 >>> ? try_to_wake_up+0x380/0x380 >>> do_io_submit+0x291/0x540 >>> ? syscall_trace_leave+0xad/0x130 >>> SyS_io_submit+0x10/0x20 >>> system_call_fastpath+0x16/0x75 >>> >>> Signed-off-by: Changwei Ge <ge.changwei at h3c.com> >>> --- >>> fs/ocfs2/file.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>> index 5d1784a..1393ff2 100644 >>> --- a/fs/ocfs2/file.c >>> +++ b/fs/ocfs2/file.c >>> @@ -2343,7 +2343,7 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb, >>> >>> written = __generic_file_write_iter(iocb, from); >>> /* buffered aio wouldn't have proper lock coverage today */ >>> - BUG_ON(written == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT)); >>> + BUG_ON(written == -EIOCBQUEUED && !direct_io); >>> >>> /* >>> * deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io >>> @@ -2463,7 +2463,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb, >>> trace_generic_file_aio_read_ret(ret); >>> >>> /* buffered aio wouldn't have proper lock coverage today */ >>> - BUG_ON(ret == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT)); >>> + BUG_ON(ret == -EIOCBQUEUED && !direct_io); >>> >>> /* see ocfs2_file_write_iter */ >>> if (ret == -EIOCBQUEUED || !ocfs2_iocb_is_rw_locked(iocb)) { >>> >> > . >
Gang He
2018-Apr-11 02:51 UTC
[Ocfs2-devel] [PATCH] ocfs2: don't use iocb when EIOCBQUEUED returns
Hi Changwei, The code change just works around the problem, but theoretically the IOCB object should not be freed before which is handled. Anyway, if we can find the root cause behind via some way (e.g. inject delay in some place), the result is more perfect. Thanks Gang>>> > Hi Jun, > > On 2018/4/11 8:52, piaojun wrote: >> Hi Changwei, >> >> It looks like a code bug, and 'iocb' should not be freed at this place. >> Could this BUG reproduced easily? > > Actually, it's not easy to be reproduced since IO is much slower than CPU > executing instructions. But the logic here is broken, we'd better fix this. > > Thanks, > Changwei > >> >> thanks, >> Jun >> >> On 2018/4/10 20:00, Changwei Ge wrote: >>> When -EIOCBQUEUED returns, it means that aio_complete() will be called >>> from dio_complete(), which is an asynchronous progress against write_iter. >>> Generally, IO is a very slow progress than executing instruction, but we >>> still can't take the risk to access a freed iocb. >>> >>> And we do face a BUG crash issue. >>> >From crash tool, iocb is obviously freed already. >>> crash> struct -x kiocb ffff881a350f5900 >>> struct kiocb { >>> ki_filp = 0xffff881a350f5a80, >>> ki_pos = 0x0, >>> ki_complete = 0x0, >>> private = 0x0, >>> ki_flags = 0x0 >>> } >>> >>> And the backtrace shows: >>> ocfs2_file_write_iter+0xcaa/0xd00 [ocfs2] >>> ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] >>> aio_run_iocb+0x229/0x2f0 >>> ? try_to_wake_up+0x380/0x380 >>> do_io_submit+0x291/0x540 >>> ? syscall_trace_leave+0xad/0x130 >>> SyS_io_submit+0x10/0x20 >>> system_call_fastpath+0x16/0x75 >>> >>> Signed-off-by: Changwei Ge <ge.changwei at h3c.com> >>> --- >>> fs/ocfs2/file.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>> index 5d1784a..1393ff2 100644 >>> --- a/fs/ocfs2/file.c >>> +++ b/fs/ocfs2/file.c >>> @@ -2343,7 +2343,7 @@ static ssize_t ocfs2_file_write_iter(struct kiocb > *iocb, >>> >>> written = __generic_file_write_iter(iocb, from); >>> /* buffered aio wouldn't have proper lock coverage today */ >>> - BUG_ON(written == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT)); >>> + BUG_ON(written == -EIOCBQUEUED && !direct_io); >>> >>> /* >>> * deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io >>> @@ -2463,7 +2463,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb, >>> trace_generic_file_aio_read_ret(ret); >>> >>> /* buffered aio wouldn't have proper lock coverage today */ >>> - BUG_ON(ret == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT)); >>> + BUG_ON(ret == -EIOCBQUEUED && !direct_io); >>> >>> /* see ocfs2_file_write_iter */ >>> if (ret == -EIOCBQUEUED || !ocfs2_iocb_is_rw_locked(iocb)) { >>> >> > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Changwei Ge
2018-May-08 15:23 UTC
[Ocfs2-devel] [PATCH] ocfs2: don't use iocb when EIOCBQUEUED returns
Hi Gang, I don't think this patch is a workaround trick. We do face the risk using freed iocb although it is actually indeed hard to encounter, it still exists. So I propose to fix it making ocfs2 more reliable. Moreover, this patch has been kept in -mm tree for one month. Can anyone help review it with ack or nack? So I can do some improvement for it. :-) Thanks, Changwei On 04/11/2018 10:51 AM, Gang He wrote:> Hi Changwei, > > The code change just works around the problem, but theoretically the IOCB object should not be freed before which is handled. > Anyway, if we can find the root cause behind via some way (e.g. inject delay in some place), the result is more perfect. > > > Thanks > Gang > > >> Hi Jun, >> >> On 2018/4/11 8:52, piaojun wrote: >>> Hi Changwei, >>> >>> It looks like a code bug, and 'iocb' should not be freed at this place. >>> Could this BUG reproduced easily? >> Actually, it's not easy to be reproduced since IO is much slower than CPU >> executing instructions. But the logic here is broken, we'd better fix this. >> >> Thanks, >> Changwei >> >>> thanks, >>> Jun >>> >>> On 2018/4/10 20:00, Changwei Ge wrote: >>>> When -EIOCBQUEUED returns, it means that aio_complete() will be called >>>> from dio_complete(), which is an asynchronous progress against write_iter. >>>> Generally, IO is a very slow progress than executing instruction, but we >>>> still can't take the risk to access a freed iocb. >>>> >>>> And we do face a BUG crash issue. >>>> >From crash tool, iocb is obviously freed already. >>>> crash> struct -x kiocb ffff881a350f5900 >>>> struct kiocb { >>>> ki_filp = 0xffff881a350f5a80, >>>> ki_pos = 0x0, >>>> ki_complete = 0x0, >>>> private = 0x0, >>>> ki_flags = 0x0 >>>> } >>>> >>>> And the backtrace shows: >>>> ocfs2_file_write_iter+0xcaa/0xd00 [ocfs2] >>>> ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] >>>> aio_run_iocb+0x229/0x2f0 >>>> ? try_to_wake_up+0x380/0x380 >>>> do_io_submit+0x291/0x540 >>>> ? syscall_trace_leave+0xad/0x130 >>>> SyS_io_submit+0x10/0x20 >>>> system_call_fastpath+0x16/0x75 >>>> >>>> Signed-off-by: Changwei Ge <ge.changwei at h3c.com> >>>> --- >>>> fs/ocfs2/file.c | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>> index 5d1784a..1393ff2 100644 >>>> --- a/fs/ocfs2/file.c >>>> +++ b/fs/ocfs2/file.c >>>> @@ -2343,7 +2343,7 @@ static ssize_t ocfs2_file_write_iter(struct kiocb >> *iocb, >>>> >>>> written = __generic_file_write_iter(iocb, from); >>>> /* buffered aio wouldn't have proper lock coverage today */ >>>> - BUG_ON(written == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT)); >>>> + BUG_ON(written == -EIOCBQUEUED && !direct_io); >>>> >>>> /* >>>> * deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io >>>> @@ -2463,7 +2463,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb, >>>> trace_generic_file_aio_read_ret(ret); >>>> >>>> /* buffered aio wouldn't have proper lock coverage today */ >>>> - BUG_ON(ret == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT)); >>>> + BUG_ON(ret == -EIOCBQUEUED && !direct_io); >>>> >>>> /* see ocfs2_file_write_iter */ >>>> if (ret == -EIOCBQUEUED || !ocfs2_iocb_is_rw_locked(iocb)) { >>>> >> _______________________________________________ >> Ocfs2-devel mailing list >> Ocfs2-devel at oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel