Juergen Keil
2009-May-28 11:48 UTC
6830386: recursive mutex_enter panic when reading past EOF with xdf driver?
Hi, Can anyone provide some info on the root cause for bug 6830386 ? "System panics when delete logical partition during dd on it in PV host" http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6830386 Does it have the same root cause as the one for opensolaris defect 9198; a read past EOF for an xdf devices leaves the "vdp->xdf_dev_lk" mutex locked? http://defect.opensolaris.org/bz/show_bug.cgi?id=9198#c1
Mark Johnson
2009-May-28 11:56 UTC
Re: 6830386: recursive mutex_enter panic when reading past EOF with xdf driver?
Juergen Keil wrote:> Hi, > > Can anyone provide some info on the root cause for bug 6830386 ? > "System panics when delete logical partition during dd on it in PV host" > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6830386 > > > Does it have the same root cause as the one for opensolaris defect 9198; > a read past EOF for an xdf devices leaves the "vdp->xdf_dev_lk" mutex > locked? > http://defect.opensolaris.org/bz/show_bug.cgi?id=9198#c1Yes, the suggested fix from bug 6830386 is.. static int xdf_strategy(struct buf *bp) { ... ... if (bp->b_blkno > p_blkct) { DPRINTF(IO_DBG, ("xdf@%s: block %lld exceeds VBD size %"PRIu64, vdp->xdf_addr, (longlong_t)bp->b_blkno, (uint64_t)p_blkct)); + mutex_exit(&vdp->xdf_dev_lk); xdf_io_err(bp, EINVAL, 0); return (0); } Thanks, MRJ
Juergen Keil
2009-May-28 12:35 UTC
Re: 6830386: recursive mutex_enter panic when reading past EOF with xdf driver?
2009/5/28 Mark Johnson <Mark.Johnson@sun.com>:> > > Juergen Keil wrote: >> >> Hi, >> >> Can anyone provide some info on the root cause for bug 6830386 ? >> "System panics when delete logical partition during dd on it in PV host" >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6830386 >> >> >> Does it have the same root cause as the one for opensolaris defect 9198; >> a read past EOF for an xdf devices leaves the "vdp->xdf_dev_lk" mutex >> locked? >> http://defect.opensolaris.org/bz/show_bug.cgi?id=9198#c1 > > > Yes, the suggested fix from bug 6830386 is.. > > static int > xdf_strategy(struct buf *bp) > { > ... > ... > if (bp->b_blkno > p_blkct) { > DPRINTF(IO_DBG, ("xdf@%s: block %lld exceeds VBD size > %"PRIu64, > vdp->xdf_addr, (longlong_t)bp->b_blkno, > (uint64_t)p_blkct)); > + mutex_exit(&vdp->xdf_dev_lk); > xdf_io_err(bp, EINVAL, 0); > return (0); > }What about the next "if (bp->b_blkno == p_blkct) ..." ? The mutex_exit should be added in two places...
Mark Johnson
2009-May-28 13:01 UTC
Re: 6830386: recursive mutex_enter panic when reading past EOF with xdf driver?
Juergen Keil wrote:> 2009/5/28 Mark Johnson <Mark.Johnson@sun.com>: >> >> Juergen Keil wrote: >>> Hi, >>> >>> Can anyone provide some info on the root cause for bug 6830386 ? >>> "System panics when delete logical partition during dd on it in PV host" >>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6830386 >>> >>> >>> Does it have the same root cause as the one for opensolaris defect 9198; >>> a read past EOF for an xdf devices leaves the "vdp->xdf_dev_lk" mutex >>> locked? >>> http://defect.opensolaris.org/bz/show_bug.cgi?id=9198#c1 >> >> Yes, the suggested fix from bug 6830386 is.. >> >> static int >> xdf_strategy(struct buf *bp) >> { >> ... >> ... >> if (bp->b_blkno > p_blkct) { >> DPRINTF(IO_DBG, ("xdf@%s: block %lld exceeds VBD size >> %"PRIu64, >> vdp->xdf_addr, (longlong_t)bp->b_blkno, >> (uint64_t)p_blkct)); >> + mutex_exit(&vdp->xdf_dev_lk); >> xdf_io_err(bp, EINVAL, 0); >> return (0); >> } > > > What about the next "if (bp->b_blkno == p_blkct) ..." ? > The mutex_exit should be added in two places...Yes you are right.. I''ve updated the bug and contacted the bug owner. Thanks! MRJ
Juergen Keil
2009-May-28 13:07 UTC
Re: 6830386: recursive mutex_enter panic when reading past EOF with xdf driver?
2009/5/28 Mark Johnson <Mark.Johnson@sun.com>:> > > Juergen Keil wrote: >> >> 2009/5/28 Mark Johnson <Mark.Johnson@sun.com>: >>> >>> Juergen Keil wrote: >>>> >>>> Hi, >>>> >>>> Can anyone provide some info on the root cause for bug 6830386 ? >>>> "System panics when delete logical partition during dd on it in PV host" >>>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6830386 >>>> >>>> >>>> Does it have the same root cause as the one for opensolaris defect 9198; >>>> a read past EOF for an xdf devices leaves the "vdp->xdf_dev_lk" mutex >>>> locked? >>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=9198#c1 >>> >>> Yes, the suggested fix from bug 6830386 is.. >>> >>> static int >>> xdf_strategy(struct buf *bp) >>> { >>> ... >>> ... >>> if (bp->b_blkno > p_blkct) { >>> DPRINTF(IO_DBG, ("xdf@%s: block %lld exceeds VBD size >>> %"PRIu64, >>> vdp->xdf_addr, (longlong_t)bp->b_blkno, >>> (uint64_t)p_blkct)); >>> + mutex_exit(&vdp->xdf_dev_lk); >>> xdf_io_err(bp, EINVAL, 0); >>> return (0); >>> } >> >> >> What about the next "if (bp->b_blkno == p_blkct) ..." ? >> The mutex_exit should be added in two places... > > > Yes you are right.. I''ve updated the bug and contacted the bug > owner. Thanks!A test case is to create a domU with a small disk (e.g. 1 mbyte), and on the domU copy the contents of the disk to /dev/null,. dd if=/dev/rdsk/c7d0p0 of=/dev/null