Mark Rutland
2018-Feb-26 11:38 UTC
v4.16-rc2: virtio-block + ext4 lockdep splats / sleeping from invalid context
On Mon, Feb 26, 2018 at 11:52:56AM +0100, Jan Kara wrote:> On Fri 23-02-18 15:47:36, Mark Rutland wrote: > > Hi all, > > > > While fuzzing arm64/v4.16-rc2 with syzkaller, I simultaneously hit a > > number of splats in the block layer: > > > > * inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-R} usage in > > jbd2_trans_will_send_data_barrier > > > > * BUG: sleeping function called from invalid context at mm/mempool.c:320 > > > > * WARNING: CPU: 0 PID: 0 at block/blk.h:297 generic_make_request_checks+0x670/0x750 > > > > ... I've included the full splats at the end of the mail. > > > > These all happen in the context of the virtio block IRQ handler, so I > > wonder if this calls something that doesn't expect to be called from IRQ > > context. Is it valid to call blk_mq_complete_request() or > > blk_mq_end_request() from an IRQ handler? > > No, it's likely a bug in detection whether IO completion should be deferred > to a workqueue or not. Does attached patch fix the problem? I don't see > exactly this being triggered by the syzkaller but it's close enough :) > > HonzaThat seems to be it! With the below patch applied, I can't trigger the bug after ~10 minutes, whereas prior to the patch I can trigger it in ~10 seconds. I'll leave that running for a while just in case there's another part to the problem, but FWIW: Tested-by: Mark Rutland <mark.rutland at arm.com> Thanks, Mark.> From 501d97ed88f5020a55a0de4d546df5ad11461cea Mon Sep 17 00:00:00 2001 > From: Jan Kara <jack at suse.cz> > Date: Mon, 26 Feb 2018 11:36:52 +0100 > Subject: [PATCH] direct-io: Fix sleep in atomic due to sync AIO > > Commit e864f39569f4 "fs: add RWF_DSYNC aand RWF_SYNC" added additional > way for direct IO to become synchronous and thus trigger fsync from the > IO completion handler. Then commit 9830f4be159b "fs: Use RWF_* flags for > AIO operations" allowed these flags to be set for AIO as well. However > that commit forgot to update the condition checking whether the IO > completion handling should be defered to a workqueue and thus AIO DIO > with RWF_[D]SYNC set will call fsync() from IRQ context resulting in > sleep in atomic. > > Fix the problem by checking directly iocb flags (the same way as it is > done in dio_complete()) instead of checking all conditions that could > lead to IO being synchronous. > > CC: Christoph Hellwig <hch at lst.de> > CC: Goldwyn Rodrigues <rgoldwyn at suse.com> > CC: stable at vger.kernel.org > Reported-by: Mark Rutland <mark.rutland at arm.com> > Fixes: 9830f4be159b29399d107bffb99e0132bc5aedd4 > Signed-off-by: Jan Kara <jack at suse.cz> > --- > fs/direct-io.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/fs/direct-io.c b/fs/direct-io.c > index a0ca9e48e993..1357ef563893 100644 > --- a/fs/direct-io.c > +++ b/fs/direct-io.c > @@ -1274,8 +1274,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, > */ > if (dio->is_async && iov_iter_rw(iter) == WRITE) { > retval = 0; > - if ((iocb->ki_filp->f_flags & O_DSYNC) || > - IS_SYNC(iocb->ki_filp->f_mapping->host)) > + if (iocb->ki_flags & IOCB_DSYNC) > retval = dio_set_defer_completion(dio); > else if (!dio->inode->i_sb->s_dio_done_wq) { > /* > -- > 2.13.6 >
Jan Kara
2018-Feb-26 12:44 UTC
v4.16-rc2: virtio-block + ext4 lockdep splats / sleeping from invalid context
On Mon 26-02-18 11:38:19, Mark Rutland wrote:> On Mon, Feb 26, 2018 at 11:52:56AM +0100, Jan Kara wrote: > > On Fri 23-02-18 15:47:36, Mark Rutland wrote: > > > Hi all, > > > > > > While fuzzing arm64/v4.16-rc2 with syzkaller, I simultaneously hit a > > > number of splats in the block layer: > > > > > > * inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-R} usage in > > > jbd2_trans_will_send_data_barrier > > > > > > * BUG: sleeping function called from invalid context at mm/mempool.c:320 > > > > > > * WARNING: CPU: 0 PID: 0 at block/blk.h:297 generic_make_request_checks+0x670/0x750 > > > > > > ... I've included the full splats at the end of the mail. > > > > > > These all happen in the context of the virtio block IRQ handler, so I > > > wonder if this calls something that doesn't expect to be called from IRQ > > > context. Is it valid to call blk_mq_complete_request() or > > > blk_mq_end_request() from an IRQ handler? > > > > No, it's likely a bug in detection whether IO completion should be deferred > > to a workqueue or not. Does attached patch fix the problem? I don't see > > exactly this being triggered by the syzkaller but it's close enough :) > > > > Honza > > That seems to be it! > > With the below patch applied, I can't trigger the bug after ~10 minutes, > whereas prior to the patch I can trigger it in ~10 seconds. I'll leave > that running for a while just in case there's another part to the > problem, but FWIW: > > Tested-by: Mark Rutland <mark.rutland at arm.com>Thanks for testing! Sent the patch to Jens for inclusion. Honza -- Jan Kara <jack at suse.com> SUSE Labs, CR
Mark Rutland
2018-Feb-27 12:03 UTC
v4.16-rc2: virtio-block + ext4 lockdep splats / sleeping from invalid context
On Mon, Feb 26, 2018 at 01:44:55PM +0100, Jan Kara wrote:> On Mon 26-02-18 11:38:19, Mark Rutland wrote: > > That seems to be it! > > > > With the below patch applied, I can't trigger the bug after ~10 minutes, > > whereas prior to the patch I can trigger it in ~10 seconds. I'll leave > > that running for a while just in case there's another part to the > > problem, but FWIW: > > > > Tested-by: Mark Rutland <mark.rutland at arm.com> > > Thanks for testing! Sent the patch to Jens for inclusion.Cheers! FWIW, I left my test case running for a day with no issue, so this looks rock solid. Mark.
Reasonably Related Threads
- v4.16-rc2: virtio-block + ext4 lockdep splats / sleeping from invalid context
- v4.16-rc2: virtio-block + ext4 lockdep splats / sleeping from invalid context
- v4.16-rc2: virtio-block + ext4 lockdep splats / sleeping from invalid context
- v4.16-rc2: virtio-block + ext4 lockdep splats / sleeping from invalid context
- v4.16-rc2: virtio-block + ext4 lockdep splats / sleeping from invalid context