Dmitry Morozovsky
2008-Feb-02 16:42 UTC
7.0-PRE/amd64 crash with Promise TX4 and eSATA disk
Dear colleagues, during rsycn from eSATA drive connected to Promise TX4 (ad12) to ZFS pool eSATA disk got disconnected. With the next access, system crashed: Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x3020e0b30 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff801cac9d stack pointer = 0x10:0xffffffffd79f7800 frame pointer = 0x10:0xffffffffd79f7840 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 898 (tcsh) trap number = 12 panic: page fault cpuid = 1 Uptime: 7h49m35s Physical memory: 4087 MB Dumping 677 MB: 662 646 630 614 598 582 566 550 534 518 502 486 470 454 438 422 406 390 374 358 342 326 310 294 278 262 246 230 214 198 182 166 150 134 118 102 86 70 54 38 22 6 #0 doadump () at pcpu.h:194 194 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:194 #1 0x0000000000000031 in ?? () #2 0xffffffff80219c30 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff8021a04d in panic (fmt=0x104 <Address 0x104 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:563 #4 0xffffffff80353284 in trap_fatal (frame=0xffffff000179c340, eva=18446742974222725120) at /usr/src/sys/amd64/amd64/trap.c:724 #5 0xffffffff80353655 in trap_pfault (frame=0xffffffffd79f7750, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641 #6 0xffffffff80353ffb in trap (frame=0xffffffffd79f7750) at /usr/src/sys/amd64/amd64/trap.c:410 #7 0xffffffff80339dbe in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #8 0xffffffff801cac9d in dev2udev (x=0xffffff0001779400) at /usr/src/sys/fs/devfs/devfs_vnops.c:1325 #9 0xffffffff80310ab8 in ufs_getattr (ap=Variable "ap" is not available. ) at /usr/src/sys/ufs/ufs/ufs_vnops.c:401 #10 0xffffffff8029dcf3 in vn_stat (vp=0xffffff004728b9b0, sb=0xffffffffd79f79f0, active_cred=Variable "active_cred" is not available. ) at vnode_if.h:286 #11 0xffffffff802953b1 in kern_stat (td=0xffffff000179c340, path=0x44b66a <Address 0x44b66a out of bounds>, pathseg=Variable "pathseg" is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2112 #12 0xffffffff80295507 in stat (td=Variable "td" is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2093 #13 0xffffffff803538da in syscall (frame=0xffffffffd79f7c70) at /usr/src/sys/amd64/amd64/trap.c:852 #14 0xffffffff80339fcb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:290 #15 0x00000000809c235c in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) I would be glad to provide additional info to investigate and hopefully fix the problem. Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------
On Sun, Feb 03, 2008 at 03:04:52AM +0300, Dmitry Morozovsky wrote:> Dear colleagues, > > during rsycn from eSATA drive connected to Promise TX4 (ad12) to ZFS pool > eSATA disk got disconnected. With the next access, system crashed: > > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x3020e0b30 > fault code = supervisor read data, page not present > instruction pointer = 0x8:0xffffffff801cac9d > stack pointer = 0x10:0xffffffffd79f7800 > frame pointer = 0x10:0xffffffffd79f7840 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 898 (tcsh) > trap number = 12 > panic: page fault > cpuid = 1 > Uptime: 7h49m35s > Physical memory: 4087 MB > Dumping 677 MB: 662 646 630 614 598 582 566 550 534 518 502 486 470 454 438 422 > 406 390 374 358 342 326 310 294 278 262 246 230 214 198 182 166 150 134 118 102 > 86 70 54 38 22 6 > > #0 doadump () at pcpu.h:194 > 194 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) bt > #0 doadump () at pcpu.h:194 > #1 0x0000000000000031 in ?? () > #2 0xffffffff80219c30 in boot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:409 > #3 0xffffffff8021a04d in panic (fmt=0x104 <Address 0x104 out of bounds>) at > /usr/src/sys/kern/kern_shutdown.c:563 > #4 0xffffffff80353284 in trap_fatal (frame=0xffffff000179c340, > eva=18446742974222725120) at /usr/src/sys/amd64/amd64/trap.c:724 > #5 0xffffffff80353655 in trap_pfault (frame=0xffffffffd79f7750, usermode=0) at > /usr/src/sys/amd64/amd64/trap.c:641 > #6 0xffffffff80353ffb in trap (frame=0xffffffffd79f7750) at > /usr/src/sys/amd64/amd64/trap.c:410 > #7 0xffffffff80339dbe in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:169 > #8 0xffffffff801cac9d in dev2udev (x=0xffffff0001779400) at > /usr/src/sys/fs/devfs/devfs_vnops.c:1325 > #9 0xffffffff80310ab8 in ufs_getattr (ap=Variable "ap" is not available. > ) at /usr/src/sys/ufs/ufs/ufs_vnops.c:401 > #10 0xffffffff8029dcf3 in vn_stat (vp=0xffffff004728b9b0, > sb=0xffffffffd79f79f0, active_cred=Variable "active_cred" is not available. > ) at vnode_if.h:286 > #11 0xffffffff802953b1 in kern_stat (td=0xffffff000179c340, path=0x44b66a > <Address 0x44b66a out of bounds>, pathseg=Variable "pathseg" is not available. > ) at /usr/src/sys/kern/vfs_syscalls.c:2112 > #12 0xffffffff80295507 in stat (td=Variable "td" is not available. > ) at /usr/src/sys/kern/vfs_syscalls.c:2093 > #13 0xffffffff803538da in syscall (frame=0xffffffffd79f7c70) at > /usr/src/sys/amd64/amd64/trap.c:852 > #14 0xffffffff80339fcb in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:290 > #15 0x00000000809c235c in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) > > I would be glad to provide additional info to investigate and hopefully fix the > problem. > > Sincerely, > D.Marck [DM5020, MCK-RIPE, DM3-RIPN] > [ FreeBSD committer: marck@FreeBSD.org ] > ------------------------------------------------------------------------ > *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** > ------------------------------------------------------------------------Di you have the UFS volume mounted from the eSATA drive ? If yes, then the panic is the natural consequence of the device disappearing from under the UFS. If not, and fault address 0x3020e0b30 looks suspicious, it could mean some kernel memory corruption. Anyway, it would be interesting to look at the vnode vp content from the frame #10, and to lookup the mount point together with a device it comes from. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080203/b6118433/attachment.pgp
Dmitry Morozovsky
2008-Feb-03 01:05 UTC
7.0-PRE/amd64 crash with Promise TX4 and eSATA disk
On Sun, 3 Feb 2008, Kostik Belousov wrote: KB> Di you have the UFS volume mounted from the eSATA drive ? If yes, then the KB> panic is the natural consequence of the device disappearing from under the KB> UFS. If not, and fault address 0x3020e0b30 looks suspicious, it could mean KB> some kernel memory corruption. yes, there is (were) UFS2 on eSATA. KB> Anyway, it would be interesting to look at the vnode vp content from the KB> frame #10, and to lookup the mount point together with a device it comes KB> from. (kgdb) fr 10 #10 0xffffffff8029dcf3 in vn_stat (vp=0xffffff004728b9b0, sb=0xffffffffd79f79f0, active_cred=Variable "active_cred" is not available. ) at vnode_if.h:286 286 vnode_if.h: No such file or directory. in vnode_if.h (kgdb) p vp $1 = (struct vnode *) 0xffffff004728b9b0 (kgdb) p *vp $2 = {v_type = VDIR, v_tag = 0xffffffff8039319c "ufs", v_op = 0xffffffff804e98e0, v_data = 0xffffff003fab0480, v_mount = 0xffffff00050dc650, v_nmntvnodes = { tqe_next = 0xffffff004728bba0, tqe_prev = 0xffffff004728f218}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0xffffffff808c10e0}, v_hash = 215211, v_cache_src = {lh_first = 0xffffff003f4d5000}, v_cache_dst = {tqh_first = 0xffffff0026fcca90, tqh_last = 0xffffff0026fccab0}, v_dd = 0xffffff00470a49b0, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_lock = {lk_object = {lo_name = 0xffffffff8039319c "ufs", lo_type = 0xffffffff8039319c "ufs", lo_flags = 70844416, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xffffffff80514730, lk_flags = 262208, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 80, lk_timo = 51, lk_lockholder = 0xffffff000179c340, lk_newlock = 0x0}, v_interlock = {lock_object = { lo_name = 0xffffffff8039e4da "vnode interlock", lo_type = 0xffffffff8039e4da "vnode interlock", lo_flags = 16973824, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, v_vnlock = 0xffffff004728ba48, v_holdcnt = 3, v_usecount = 2, v_iflag = 0, v_vflag = 0, v_writecount = 0, v_freelist = {tqe_next = 0x0, tqe_prev = 0xffffff004728b900}, v_bufobj = {bo_mtx = 0xffffff004728ba98, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xffffff004728bb08}, bv_root = 0x0, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xffffff004728bb28}, bv_root = 0x0, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_ops = 0xffffffff804dd3e0, bo_bsize = 65536, bo_object = 0xffffff0047994680, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xffffff004728b9b0, __bo_vnode = 0xffffff004728b9b0}, v_pollinfo = 0x0, v_label = 0x0} I think tere are at least two problems here: - panic when non-essential UFS mounted partition disappears - particular disappearing eSATA drive from eSATA channel of TX4. Relevant error messages are Feb 2 19:29:18 <kern.crit> hamster kernel: ata7: reiniting channel .. Feb 2 19:29:18 <kern.crit> hamster kernel: ata7: SATA connect time=0ms Feb 2 19:29:18 <kern.crit> hamster kernel: ata7: reset tp1 mask=01 ostat0=d0 ostat1=00 Feb 2 19:29:18 <kern.crit> hamster kernel: ata7: stat0=0xd0 err=0x00 lsb=0x36 msb=0x72 Feb 2 19:29:26 <kern.crit> hamster last message repeated 87 times Feb 2 19:29:27 <kern.crit> hamster kernel: Feb 2 19:29:27 <kern.crit> hamster kernel: ata7: stat0=0xd0 err=0x00 lsb=0x36 msb=0x72 Feb 2 19:29:49 <kern.crit> hamster last message repeated 221 times Feb 2 19:29:49 <kern.crit> hamster kernel: ata7: reset tp2 stat0=d0 stat1=00 devices=0x0 Feb 2 19:29:49 <kern.crit> hamster kernel: ad14: FAILURE - device detached Feb 2 19:29:49 <kern.crit> hamster kernel: ad1g4_:v fdse_tdaocnhee(d): Feb 2 19:29:49 <kern.crit> hamster kernel: ad14a[aRtEaA7D:( orfefisneitt= d1o2n4e0 9.1.43 Feb 2 19:29:49 <kern.crit> hamster kernel: 2960, length=131072)]error = 6 Feb 2 19:29:49 <kern.crit> hamster kernel: g_vfs_done():ad14a[READ(offset=124091564032, length=131072)]error = 6 Feb 2 19:29:49 <kern.crit> hamster kernel: g_vfs_done():ad14a[READ(offset=124091695104, length=131072)]error = 6 ... and zillion of g_vfs_gone after. Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------
On Sun, Feb 03, 2008 at 12:05:44PM +0300, Dmitry Morozovsky wrote:> On Sun, 3 Feb 2008, Kostik Belousov wrote: > > KB> Di you have the UFS volume mounted from the eSATA drive ? If yes, then the > KB> panic is the natural consequence of the device disappearing from under the > KB> UFS. If not, and fault address 0x3020e0b30 looks suspicious, it could mean > KB> some kernel memory corruption. > > yes, there is (were) UFS2 on eSATA. > > KB> Anyway, it would be interesting to look at the vnode vp content from the > KB> frame #10, and to lookup the mount point together with a device it comes > KB> from. > > (kgdb) fr 10 > #10 0xffffffff8029dcf3 in vn_stat (vp=0xffffff004728b9b0, > sb=0xffffffffd79f79f0, active_cred=Variable "active_cred" is not available. > ) at vnode_if.h:286 > 286 vnode_if.h: No such file or directory. > in vnode_if.h > (kgdb) p vp > $1 = (struct vnode *) 0xffffff004728b9b0 > (kgdb) p *vp > $2 = {v_type = VDIR, v_tag = 0xffffffff8039319c "ufs", v_op = > 0xffffffff804e98e0, v_data = 0xffffff003fab0480, v_mount = 0xffffff00050dc650,The *v_mount and *(struct ufs_mount *)(v_mount->mnt_data) content shall be enough to confirm that vnode comes from the lost partition.> v_nmntvnodes = { > tqe_next = 0xffffff004728bba0, tqe_prev = 0xffffff004728f218}, v_un = > {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist > = {le_next = 0x0, > le_prev = 0xffffffff808c10e0}, v_hash = 215211, v_cache_src = {lh_first = > 0xffffff003f4d5000}, v_cache_dst = {tqh_first = 0xffffff0026fcca90, tqh_last = > 0xffffff0026fccab0}, > v_dd = 0xffffff00470a49b0, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = > 0, v_lock = {lk_object = {lo_name = 0xffffffff8039319c "ufs", lo_type = > 0xffffffff8039319c "ufs", > lo_flags = 70844416, lo_witness_data = {lod_list = {stqe_next = 0x0}, > lod_witness = 0x0}}, lk_interlock = 0xffffffff80514730, lk_flags = 262208, > lk_sharecount = 0, > lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 80, lk_timo = 51, > lk_lockholder = 0xffffff000179c340, lk_newlock = 0x0}, v_interlock = > {lock_object = { > lo_name = 0xffffffff8039e4da "vnode interlock", lo_type = > 0xffffffff8039e4da "vnode interlock", lo_flags = 16973824, lo_witness_data = > {lod_list = {stqe_next = 0x0}, > lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, v_vnlock = > 0xffffff004728ba48, v_holdcnt = 3, v_usecount = 2, v_iflag = 0, v_vflag = 0, > v_writecount = 0, > v_freelist = {tqe_next = 0x0, tqe_prev = 0xffffff004728b900}, v_bufobj = > {bo_mtx = 0xffffff004728ba98, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = > 0xffffff004728bb08}, > bv_root = 0x0, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, > tqh_last = 0xffffff004728bb28}, bv_root = 0x0, bv_cnt = 0}, bo_numoutput = 0, > bo_flag = 0, > bo_ops = 0xffffffff804dd3e0, bo_bsize = 65536, bo_object = > 0xffffff0047994680, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = > 0xffffff004728b9b0, > __bo_vnode = 0xffffff004728b9b0}, v_pollinfo = 0x0, v_label = 0x0} > > > I think tere are at least two problems here: > - panic when non-essential UFS mounted partition disappearsUnfortunately, FreeBSD has no concept of the unessential mount; I wish the mount option onerror=nopanic too :).> - particular disappearing eSATA drive from eSATA channel of TX4. Relevant error > messages areThis looks more like the hardware problem, and it only induced the known kernel deficiency.> > Feb 2 19:29:18 <kern.crit> hamster kernel: ata7: reiniting channel .. > Feb 2 19:29:18 <kern.crit> hamster kernel: ata7: SATA connect time=0ms > Feb 2 19:29:18 <kern.crit> hamster kernel: ata7: reset tp1 mask=01 ostat0=d0 > ostat1=00 > Feb 2 19:29:18 <kern.crit> hamster kernel: ata7: stat0=0xd0 err=0x00 lsb=0x36 > msb=0x72 > Feb 2 19:29:26 <kern.crit> hamster last message repeated 87 times > Feb 2 19:29:27 <kern.crit> hamster kernel: > Feb 2 19:29:27 <kern.crit> hamster kernel: ata7: stat0=0xd0 err=0x00 lsb=0x36 > msb=0x72 > Feb 2 19:29:49 <kern.crit> hamster last message repeated 221 times > Feb 2 19:29:49 <kern.crit> hamster kernel: ata7: reset tp2 stat0=d0 stat1=00 > devices=0x0 > Feb 2 19:29:49 <kern.crit> hamster kernel: ad14: FAILURE - device detached > Feb 2 19:29:49 <kern.crit> hamster kernel: ad1g4_:v fdse_tdaocnhee(d): > Feb 2 19:29:49 <kern.crit> hamster kernel: ad14a[aRtEaA7D:( orfefisneitt= > d1o2n4e0 9.1.43 > Feb 2 19:29:49 <kern.crit> hamster kernel: 2960, length=131072)]error = 6 > Feb 2 19:29:49 <kern.crit> hamster kernel: > g_vfs_done():ad14a[READ(offset=124091564032, length=131072)]error = 6 > Feb 2 19:29:49 <kern.crit> hamster kernel: > g_vfs_done():ad14a[READ(offset=124091695104, length=131072)]error = 6 > > ... and zillion of g_vfs_gone after. > > Sincerely, > D.Marck [DM5020, MCK-RIPE, DM3-RIPN] > [ FreeBSD committer: marck@FreeBSD.org ] > ------------------------------------------------------------------------ > *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** > -------------------------------------------------------------------------------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080203/832543af/attachment.pgp