Matthew Flanagan
2006-Nov-02 06:04 UTC
[zfs-discuss] reproducible zfs panic on Solaris 10 06/06
Hi, I am able to reproduce the following panic on a number of Solaris 10 06/06 boxes (Sun Blade 150, V210 and T2000). The script to do this is: #!/bin/sh -x uname -a mkfile 100m /data zpool create tank /data zpool status cd /tank ls -al cp /etc/services . ls -al cd / rm /data zpool status # uncomment the following lines if you want to see the system think # it can still read and write to the filesystem after the backing store has gone. #date #sleep 60 #date #zpool status #cd /tank #ls -al #cp /etc/passwd . #ls -al #cd / #zpool status zpool scrub tank zpool status Console output is: SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Thu Nov 2 16:20:36 EST 2006 PLATFORM: SUNW,Sun-Fire-V210, CSN: -, HOSTNAME: jsclient1 SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 7bac21b6-76e7-ecbd-a63a-982be2230f9d DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run ''zpool status -x'' and replace the bad device. panic[cpu0]/thread=2a1011d3cc0: ZFS: I/O failure (write on <unknown> off 0: zio 60007432bc0 [L0 unallocated] 4000L/400P DVA[0]=<0:b000:400> DVA[1]=<0:120a000:400> fletcher4 lzjb BE contiguous birth=6 fill=0 cksum=672165b9e7:328e78ae25fd:ed007c9008f5f:34c05b10900b668): error 6 000002a1011d3740 zfs:zio_done+284 (60007432bc0, 0, a8, 7035dbf0, 0, 60006fa9700) %l0-3: 00000600036d4b40 000000007035d800 0000000000000006 0000000000000006 %l4-7: 000000007bb9a278 0000000000000002 0000000000000006 0000000000000006 000002a1011d3940 zfs:zio_vdev_io_assess+178 (60007432bc0, 8000, 10, 0, 0, 10) %l0-3: 0000000000000002 0000000000000002 0000000000000000 0000000000000006 %l4-7: 0000000000000010 000000022042d79b 0000000000000000 0000056ceb2506fa 000002a1011d3a00 genunix:taskq_thread+1a4 (60001b81808, 60001b817b0, 50001, 56f0b67e471, 2a1011d3aca, 2a1011d3ac8) %l0-3: 0000000000010000 0000060001b817d8 0000060001b817e0 0000060001b817e2 %l4-7: 00000600020f7310 0000000000000002 0000000000000000 0000060001b817d0 syncing file systems... 3 1 done dumping to /dev/md/dsk/d10, offset 429588480, content: kernel 100% done: 30100 pages dumped, compression ratio 3.77, dump succeeded is there a fix for this? regards matthew This message posted from opensolaris.org
Matthew Ahrens
2006-Nov-02 19:13 UTC
[zfs-discuss] reproducible zfs panic on Solaris 10 06/06
Matthew Flanagan wrote:> mkfile 100m /data > zpool create tank /data...> rm /data...> panic[cpu0]/thread=2a1011d3cc0: ZFS: I/O failure (write on <unknown> off 0: zio 60007432bc0 [L0 unallocated] 4000L/400P DVA[0]=<0:b000:400> DVA[1]=<0:120a000:400> fletcher4 lzjb BE contiguous birth=6 fill=0 cksum=672165b9e7:328e78ae25fd:ed007c9008f5f:34c05b10900b668): error 6...> is there a fix for this?Um, don''t do that? This is a known bug that we''re working on. --matt
Matthew Flanagan
2006-Nov-02 23:31 UTC
[zfs-discuss] Re: reproducible zfs panic on Solaris 10 06/06
Matt,> Matthew Flanagan wrote: > > mkfile 100m /data > > zpool create tank /data > ... > > rm /data > ... > > panic[cpu0]/thread=2a1011d3cc0: ZFS: I/O failure > (write on <unknown> off 0: zio 60007432bc0 [L0 > unallocated] 4000L/400P DVA[0]=<0:b000:400> > DVA[1]=<0:120a000:400> fletcher4 lzjb BE contiguous > birth=6 fill=0 > cksum=672165b9e7:328e78ae25fd:ed007c9008f5f:34c05b1090 > 0b668): error 6 > ... > > is there a fix for this? > > Um, don''t do that? > > This is a known bug that we''re working on.What is the bugid for this an ETA for fix? I''m extremely surprised that this kind of bug can make it into a Solaris release. This is the second zfs related panic that I''ve found in testing it in our labs. The first was caused to the system to panic when the ZFS volume got close to 100% full (Sun case id #10914593). I''ve just replicated this panic with a USB flash drive as well by creating the zpool and then yanking the drive out. This is probably a common situation for desktop/laptop users who would not be impressed that their otherwise robust Solaris system crashed. regards matthew> > --matt > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss >This message posted from opensolaris.org
Mark Maybee
2006-Nov-03 22:08 UTC
[zfs-discuss] Re: reproducible zfs panic on Solaris 10 06/06
Matthew Flanagan wrote:> Matt, > > >>Matthew Flanagan wrote: >> >>>mkfile 100m /data >>>zpool create tank /data >> >>... >> >>>rm /data >> >>... >> >>>panic[cpu0]/thread=2a1011d3cc0: ZFS: I/O failure >> >>(write on <unknown> off 0: zio 60007432bc0 [L0 >>unallocated] 4000L/400P DVA[0]=<0:b000:400> >>DVA[1]=<0:120a000:400> fletcher4 lzjb BE contiguous >>birth=6 fill=0 >>cksum=672165b9e7:328e78ae25fd:ed007c9008f5f:34c05b1090 >>0b668): error 6 >>... >> >>>is there a fix for this? >> >>Um, don''t do that? >> >>This is a known bug that we''re working on. > > > What is the bugid for this an ETA for fix? >6417779 ZFS: I/O failure (write on ...) -- need to reallocate writes and 6322646 ZFS should gracefully handle all devices failing (when writing) These bugs are actively being worked on, but it will probably be a while before fixes appear. -Mark> I''m extremely surprised that this kind of bug can make it into a Solaris release. This is the second zfs related panic that I''ve found in testing it in our labs. The first was caused to the system to panic when the ZFS volume got close to 100% full (Sun case id #10914593). > > I''ve just replicated this panic with a USB flash drive as well by creating the zpool and then yanking the drive out. This is probably a common situation for desktop/laptop users who would not be impressed that their otherwise robust Solaris system crashed. > > regards > > matthew
Akhilesh Mritunjai
2006-Nov-04 10:44 UTC
[zfs-discuss] Re: reproducible zfs panic on Solaris 10 06/06
> zpool status > # uncomment the following lines if you want to see > the system think > # it can still read and write to the filesystem after > the backing store has gone.Hi UNIX unlink() syscall doesn''t remove the inode if its in use. Its marked to be unliked when its use count falls to zero. So deleting any file has no effect on applications already having it open. I''m not surprized by the "yaking out USB drive" test (already know the bug exists)... but this unliking test is puzzling me. Does ZFS subsystem closes and re''opens files in due course of usage ? This message posted from opensolaris.org