David J. Orman
2006-Mar-22 03:54 UTC
[zfs-discuss] Crash while doing zfs destroy on a snapshot
Hi, I ran ZFS destroy blah at blah, it sat for quite some time. Nothing was happening. I tried to ^C, it showed the attempt but didn''t die off. zpool iostat was showing no activity. I''m used to destroying snapshots taking a second at most, so this was odd. I did some work on a machine that was using the ZFS shares, and suddenly, the solaris box was gone. I was remote at the time. When I arrived at the location of the servers the solaris box had rebooted on it''s own, it was sitting at a prompt warning that the fs wasn''t synched, so I needed to reboot and clean it up. I did that, then I booted back into solaris. Everything seems fine now, and a destroy on that same snapshot took less than a second. I''d like to help determine the cause of this problem, but I need your help. I know very very little about Solaris, and I don''t even know where to begin gathering information to provide you all with. It''s solaris express 3/06. If you can tell me what to give you/how to get it, I will do my best! Cheers, David
David J. Orman
2006-Mar-22 04:03 UTC
[zfs-discuss] Crash while doing zfs destroy on a snapshot
After reading another post on the ZFS mailing list, I saw how he gave some output. So I''m doing the same. I hope it''s the correct information! # mdb unix.0 vmcore.0 Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp ufs ip sctp usba fcp fctl md lofs nca random crypto zfs fcip logindmux ptm sppp nfs ipc ] > ::status debugging crash dump vmcore.0 (32-bit) from server2 operating system: 5.11 snv_33 (i86pc) panic message: assertion failed: traverse(vpp) == 0, file: ../../common/fs/zfs/ zfs_ctldir.c, li ne: 619 dump content: kernel pages only > ::stack vpanic(fe9a5178, f8c9a1a4, f8c9a180, 26b) assfail+0x5c(f8c9a1a4, f8c9a180, 26b) zfsctl_snapdir_lookup+0x393(d3120840, d214dcc0, d214de14, d214de90, 0, d1a2bc00 ) fop_lookup+0x2f(d3120840, d214dcc0, d214de14, d214de90, 0, d1a2bc00) lookuppnvp+0x2c8(d214de90, 0, 1, 0, d214df58, d1a2bc00) lookuppnat+0xea(d214de90, 0, 1, 0, d214df58, 0) lookupnameat+0x54(80dec08, 0, 1, 0, d214df58, 0) cstatat_getvp+0x13f(ffd19553, 80dec08, 1, 1, d214df58, d214df5c) cstatat64+0x37() stat64+0x1c() sys_sysenter+0x100() > *panic_thread::findstack -v stack pointer for thread d2655000: d214da80 d214da98 panic+0x12(fe9a5178, f8c9a1a4, f8c9a180, 26b) d214dabc assfail+0x5c(f8c9a1a4, f8c9a180, 26b) d214dc50 zfsctl_snapdir_lookup+0x393(d3120840, d214dcc0, d214de14, d214de90, 0 , d1a2bc00) d214dc90 fop_lookup+0x2f(d3120840, d214dcc0, d214de14, d214de90, 0, d1a2bc00) d214de1c lookuppnvp+0x2c8(d214de90, 0, 1, 0, d214df58, d1a2bc00) d214de64 lookuppnat+0xea(d214de90, 0, 1, 0, d214df58, 0) d214dee4 lookupnameat+0x54(80dec08, 0, 1, 0, d214df58, 0) d214df28 cstatat_getvp+0x13f(ffd19553, 80dec08, 1, 1, d214df58, d214df5c) d214df60 cstatat64+0x37() d214df84 stat64+0x1c() d214dfac sys_sysenter+0x100() > On Mar 21, 2006, at 5:54 PM, David J. Orman wrote:> Hi, > > I ran ZFS destroy blah at blah, it sat for quite some time. Nothing > was happening. I tried to ^C, it showed the attempt but didn''t die > off. zpool iostat was showing no activity. I''m used to destroying > snapshots taking a second at most, so this was odd. I did some work > on a machine that was using the ZFS shares, and suddenly, the > solaris box was gone. I was remote at the time. > > When I arrived at the location of the servers the solaris box had > rebooted on it''s own, it was sitting at a prompt warning that the > fs wasn''t synched, so I needed to reboot and clean it up. I did > that, then I booted back into solaris. Everything seems fine now, > and a destroy on that same snapshot took less than a second. > > I''d like to help determine the cause of this problem, but I need > your help. I know very very little about Solaris, and I don''t even > know where to begin gathering information to provide you all with. > It''s solaris express 3/06. If you can tell me what to give you/how > to get it, I will do my best! > > Cheers, > David > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
James C. McPherson
2006-Mar-22 04:07 UTC
[zfs-discuss] Crash while doing zfs destroy on a snapshot
David J. Orman wrote: ...> I''d like to help determine the cause of this problem, but I need your > help. I know very very little about Solaris, and I don''t even know where > to begin gathering information to provide you all with. It''s solaris > express 3/06. If you can tell me what to give you/how to get it, I will > do my best!Hi David, wander on over to /var/crash/`nodename` and look for two files called unix.[X] and vmcore.[X] where X is a number from 0 to ... you can use mdb to get the msgbuf and panic thread''s stack. Assuming this is crash 0 on the box, run mdb like this: # mdb -k 0 ::log /tmp/zfs_crash ::status ::modinfo ! grep zfs ::msgbuf *panic_thread::findstack -v then if you could post the output (from /tmp/zfs_crash) either to this list or use http://pastebin.ca people can start helping figure out what was going on with your system. cheers, James C. McPherson -- Solaris Datapath Engineering Data Management Group Sun Microsystems
David J. Orman
2006-Mar-22 04:17 UTC
[zfs-discuss] Crash while doing zfs destroy on a snapshot
http://corenode.com/~ormandj/files/zfs_crash Let me know what to do next, David On Mar 21, 2006, at 6:07 PM, James C. McPherson wrote:> David J. Orman wrote: > ... >> I''d like to help determine the cause of this problem, but I need >> your help. I know very very little about Solaris, and I don''t even >> know where to begin gathering information to provide you all with. >> It''s solaris express 3/06. If you can tell me what to give you/how >> to get it, I will do my best! > > Hi David, > wander on over to /var/crash/`nodename` and look for two files called > > unix.[X] and > vmcore.[X] > > where X is a number from 0 to ... > > > you can use mdb to get the msgbuf and panic thread''s stack. Assuming > this is crash 0 on the box, run mdb like this: > > > # mdb -k 0 > > ::log /tmp/zfs_crash > ::status > ::modinfo ! grep zfs > ::msgbuf > *panic_thread::findstack -v > > > then if you could post the output (from /tmp/zfs_crash) either to this > list or use http://pastebin.ca people can start helping figure out > what > was going on with your system. > > > cheers, > James C. McPherson > -- > Solaris Datapath Engineering > Data Management Group > Sun Microsystems
Eric Schrock
2006-Mar-22 04:20 UTC
[zfs-discuss] Crash while doing zfs destroy on a snapshot
You _should_ have a crash dump waiting for you when the system comes back up. Check out: # cd /var/crash/<machinename> # mdb 0 (or whatever the highest vmcore.X number is) (in MDB)> ::status > $C > ::msgbufThat''ll give us some basic information about what caused the crash, and we can go from there. With any luck you''ve hit a bug that we''ve already fixed (SX 03/06 is somewhat old in this regard). - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
James C. McPherson
2006-Mar-22 04:26 UTC
[zfs-discuss] Crash while doing zfs destroy on a snapshot
David J. Orman wrote:> After reading another post on the ZFS mailing list, I saw how he gave > some output. So I''m doing the same. I hope it''s the correct information!...> panic message: > assertion failed: traverse(vpp) == 0, file: ../../common/fs/zfs/zfs_ctldir.c, line: 619...> *panic_thread::findstack -v > stack pointer for thread d2655000: d214da80 > d214da98 panic+0x12(fe9a5178, f8c9a1a4, f8c9a180, 26b) > d214dabc assfail+0x5c(f8c9a1a4, f8c9a180, 26b) > d214dc50 zfsctl_snapdir_lookup+0x393(d3120840, d214dcc0, d214de14, > d214de90, 0 > , d1a2bc00) > d214dc90 fop_lookup+0x2f(d3120840, d214dcc0, d214de14, d214de90, 0, > d1a2bc00) > d214de1c lookuppnvp+0x2c8(d214de90, 0, 1, 0, d214df58, d1a2bc00) > d214de64 lookuppnat+0xea(d214de90, 0, 1, 0, d214df58, 0) > d214dee4 lookupnameat+0x54(80dec08, 0, 1, 0, d214df58, 0) > d214df28 cstatat_getvp+0x13f(ffd19553, 80dec08, 1, 1, d214df58, d214df5c) > d214df60 cstatat64+0x37() > d214df84 stat64+0x1c() > d214dfac sys_sysenter+0x100()Hi David, thanks for the info. The stack looks like a dead ringer for 6374110 ZFS Panic in zfs_lookup() during snapshot creation http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6374110 But don''t quote me on that - I''m not a zfs expert. According to internal resources, the above bug is fixed in snv_36 - Nevada build 36 - which should be available either RSN or from http://dlc.sun.com/osol/on/downloads/current/ best regards, James C. McPherson -- Solaris Datapath Engineering Data Management Group Sun Microsystems
Mark Maybee
2006-Mar-22 20:45 UTC
[zfs-discuss] Crash while doing zfs destroy on a snapshot
David, As James McPherson pointed out, this is almost certainly a dup of a bug that has been fixed in recent bits. Its not clear however, why you experienced the delay before the panic. Without a kernel dump taken while that was happening, however, it really isn''t possible to troubleshoot that issue. There have been a number of bug fixes in this part of the code in the last few weeks. I suggest you try running newer bits and see if that makes things more stable for you. If you see this problem again (with the new bits), we will track it down. -Mark David J. Orman wrote:> After reading another post on the ZFS mailing list, I saw how he gave > some output. So I''m doing the same. I hope it''s the correct information! > > # mdb unix.0 vmcore.0 > Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp ufs > ip sctp usba fcp fctl md lofs nca random crypto zfs fcip logindmux ptm > sppp nfs ipc ] > > ::status > debugging crash dump vmcore.0 (32-bit) from server2 > operating system: 5.11 snv_33 (i86pc) > panic message: > assertion failed: traverse(vpp) == 0, file: ../../common/fs/zfs/ > zfs_ctldir.c, li > ne: 619 > dump content: kernel pages only > > ::stack > vpanic(fe9a5178, f8c9a1a4, f8c9a180, 26b) > assfail+0x5c(f8c9a1a4, f8c9a180, 26b) > zfsctl_snapdir_lookup+0x393(d3120840, d214dcc0, d214de14, d214de90, 0, > d1a2bc00 > ) > fop_lookup+0x2f(d3120840, d214dcc0, d214de14, d214de90, 0, d1a2bc00) > lookuppnvp+0x2c8(d214de90, 0, 1, 0, d214df58, d1a2bc00) > lookuppnat+0xea(d214de90, 0, 1, 0, d214df58, 0) > lookupnameat+0x54(80dec08, 0, 1, 0, d214df58, 0) > cstatat_getvp+0x13f(ffd19553, 80dec08, 1, 1, d214df58, d214df5c) > cstatat64+0x37() > stat64+0x1c() > sys_sysenter+0x100() > > *panic_thread::findstack -v > stack pointer for thread d2655000: d214da80 > d214da98 panic+0x12(fe9a5178, f8c9a1a4, f8c9a180, 26b) > d214dabc assfail+0x5c(f8c9a1a4, f8c9a180, 26b) > d214dc50 zfsctl_snapdir_lookup+0x393(d3120840, d214dcc0, d214de14, > d214de90, 0 > , d1a2bc00) > d214dc90 fop_lookup+0x2f(d3120840, d214dcc0, d214de14, d214de90, 0, > d1a2bc00) > d214de1c lookuppnvp+0x2c8(d214de90, 0, 1, 0, d214df58, d1a2bc00) > d214de64 lookuppnat+0xea(d214de90, 0, 1, 0, d214df58, 0) > d214dee4 lookupnameat+0x54(80dec08, 0, 1, 0, d214df58, 0) > d214df28 cstatat_getvp+0x13f(ffd19553, 80dec08, 1, 1, d214df58, > d214df5c) > d214df60 cstatat64+0x37() > d214df84 stat64+0x1c() > d214dfac sys_sysenter+0x100() > > > > > > On Mar 21, 2006, at 5:54 PM, David J. Orman wrote: > >> Hi, >> >> I ran ZFS destroy blah at blah, it sat for quite some time. Nothing was >> happening. I tried to ^C, it showed the attempt but didn''t die off. >> zpool iostat was showing no activity. I''m used to destroying >> snapshots taking a second at most, so this was odd. I did some work >> on a machine that was using the ZFS shares, and suddenly, the solaris >> box was gone. I was remote at the time. >> >> When I arrived at the location of the servers the solaris box had >> rebooted on it''s own, it was sitting at a prompt warning that the fs >> wasn''t synched, so I needed to reboot and clean it up. I did that, >> then I booted back into solaris. Everything seems fine now, and a >> destroy on that same snapshot took less than a second. >> >> I''d like to help determine the cause of this problem, but I need your >> help. I know very very little about Solaris, and I don''t even know >> where to begin gathering information to provide you all with. It''s >> solaris express 3/06. If you can tell me what to give you/how to get >> it, I will do my best! >> >> Cheers, >> David >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss