thr3ads.net - zfs discuss - [zfs-discuss] reproducible zfs panic on Solaris 10 06/06 [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Matthew Flanagan

2006-Nov-02 06:04 UTC

[zfs-discuss] reproducible zfs panic on Solaris 10 06/06

Hi,

I am able to reproduce the following panic on a number of Solaris 10 06/06 boxes
(Sun Blade 150, V210 and T2000). The script to do this is:

#!/bin/sh -x
uname -a
mkfile 100m /data
zpool create tank /data
zpool status
cd /tank
ls -al
cp /etc/services .
ls -al
cd /
rm /data
zpool status
# uncomment the following lines if you want to see the system think
# it can still read and write to the filesystem after the backing store has
gone.
#date
#sleep 60
#date
#zpool status
#cd /tank
#ls -al
#cp /etc/passwd .
#ls -al
#cd /
#zpool status
zpool scrub tank
zpool status

Console output is:

SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Thu Nov  2 16:20:36 EST 2006
PLATFORM: SUNW,Sun-Fire-V210, CSN: -, HOSTNAME: jsclient1
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 7bac21b6-76e7-ecbd-a63a-982be2230f9d
DESC: A ZFS device failed.  Refer to http://sun.com/msg/ZFS-8000-D3 for more
information.
AUTO-RESPONSE: No automated response will occur.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run ''zpool status -x'' and replace the bad device.

panic[cpu0]/thread=2a1011d3cc0: ZFS: I/O failure (write on <unknown> off
0: zio 60007432bc0 [L0 unallocated] 4000L/400P DVA[0]=<0:b000:400>
DVA[1]=<0:120a000:400> fletcher4 lzjb BE contiguous birth=6 fill=0
cksum=672165b9e7:328e78ae25fd:ed007c9008f5f:34c05b10900b668): error 6

000002a1011d3740 zfs:zio_done+284 (60007432bc0, 0, a8, 7035dbf0, 0, 60006fa9700)
  %l0-3: 00000600036d4b40 000000007035d800 0000000000000006 0000000000000006
  %l4-7: 000000007bb9a278 0000000000000002 0000000000000006 0000000000000006
000002a1011d3940 zfs:zio_vdev_io_assess+178 (60007432bc0, 8000, 10, 0, 0, 10)
  %l0-3: 0000000000000002 0000000000000002 0000000000000000 0000000000000006
  %l4-7: 0000000000000010 000000022042d79b 0000000000000000 0000056ceb2506fa
000002a1011d3a00 genunix:taskq_thread+1a4 (60001b81808, 60001b817b0, 50001,
56f0b67e471, 2a1011d3aca, 2a1011d3ac8)
  %l0-3: 0000000000010000 0000060001b817d8 0000060001b817e0 0000060001b817e2
  %l4-7: 00000600020f7310 0000000000000002 0000000000000000 0000060001b817d0

syncing file systems... 3 1 done
dumping to /dev/md/dsk/d10, offset 429588480, content: kernel
100% done: 30100 pages dumped, compression ratio 3.77, dump succeeded

is there a fix for this?

regards

matthew
 
 
This message posted from opensolaris.org

Matthew Ahrens

2006-Nov-02 19:13 UTC

head link

[zfs-discuss] reproducible zfs panic on Solaris 10 06/06

Matthew Flanagan wrote:> mkfile 100m /data
> zpool create tank /data
...> rm /data
...> panic[cpu0]/thread=2a1011d3cc0: ZFS: I/O failure (write on <unknown>
off 0: zio 60007432bc0 [L0 unallocated] 4000L/400P DVA[0]=<0:b000:400>
DVA[1]=<0:120a000:400> fletcher4 lzjb BE contiguous birth=6 fill=0
cksum=672165b9e7:328e78ae25fd:ed007c9008f5f:34c05b10900b668): error 6
...> is there a fix for this?
Um, don''t do that?

This is a known bug that we''re working on.

--matt

Matthew Flanagan

2006-Nov-02 23:31 UTC

head link

[zfs-discuss] Re: reproducible zfs panic on Solaris 10 06/06

Matt,
> Matthew Flanagan wrote:
> > mkfile 100m /data
> > zpool create tank /data
> ...
> > rm /data
> ...
> > panic[cpu0]/thread=2a1011d3cc0: ZFS: I/O failure
> (write on <unknown> off 0: zio 60007432bc0 [L0
> unallocated] 4000L/400P DVA[0]=<0:b000:400>
> DVA[1]=<0:120a000:400> fletcher4 lzjb BE contiguous
> birth=6 fill=0
> cksum=672165b9e7:328e78ae25fd:ed007c9008f5f:34c05b1090
> 0b668): error 6
> ...
> > is there a fix for this?
> 
> Um, don''t do that?
> 
> This is a known bug that we''re working on.
What is the bugid for this an ETA for fix?

I''m extremely surprised that this kind of bug can make it into a
Solaris release. This is the second zfs related panic that I''ve found
in testing it in our labs. The first was caused to the system to panic when the
ZFS volume got close to 100% full (Sun case id #10914593).

I''ve just replicated this panic with a USB flash drive as well by
creating the zpool and then yanking the drive out. This is probably a common
situation for desktop/laptop users who would not be impressed that their
otherwise robust Solaris system crashed.

regards

matthew
> 
> --matt
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
> 

This message posted from opensolaris.org

Mark Maybee

2006-Nov-03 22:08 UTC

head link

[zfs-discuss] Re: reproducible zfs panic on Solaris 10 06/06

Matthew Flanagan wrote:> Matt,
> 
> 
>>Matthew Flanagan wrote:
>>
>>>mkfile 100m /data
>>>zpool create tank /data
>>
>>...
>>
>>>rm /data
>>
>>...
>>
>>>panic[cpu0]/thread=2a1011d3cc0: ZFS: I/O failure
>>
>>(write on <unknown> off 0: zio 60007432bc0 [L0
>>unallocated] 4000L/400P DVA[0]=<0:b000:400>
>>DVA[1]=<0:120a000:400> fletcher4 lzjb BE contiguous
>>birth=6 fill=0
>>cksum=672165b9e7:328e78ae25fd:ed007c9008f5f:34c05b1090
>>0b668): error 6
>>...
>>
>>>is there a fix for this?
>>
>>Um, don''t do that?
>>
>>This is a known bug that we''re working on.
> 
> 
> What is the bugid for this an ETA for fix?
> 6417779 ZFS: I/O failure (write on ...) -- need to reallocate writes
and
6322646 ZFS should gracefully handle all devices failing (when writing)

These bugs are actively being worked on, but it will probably be a while
before fixes appear.

-Mark> I''m extremely surprised that this kind of bug can make it into a
Solaris release. This is the second zfs related panic that I''ve found
in testing it in our labs. The first was caused to the system to panic when the
ZFS volume got close to 100% full (Sun case id #10914593).
> 
> I''ve just replicated this panic with a USB flash drive as well by
creating the zpool and then yanking the drive out. This is probably a common
situation for desktop/laptop users who would not be impressed that their
otherwise robust Solaris system crashed.
> 
> regards
> 
> matthew

Akhilesh Mritunjai

2006-Nov-04 10:44 UTC

head link

[zfs-discuss] Re: reproducible zfs panic on Solaris 10 06/06

> zpool status
> # uncomment the following lines if you want to see
> the system think
> # it can still read and write to the filesystem after
> the backing store has gone.
Hi

UNIX unlink() syscall doesn''t remove the inode if its in use. Its
marked to be unliked when its use count falls to zero. So deleting any file has
no effect on applications already having it open.

I''m not surprized by the "yaking out USB drive" test (already
know the bug exists)... but this unliking test is puzzling me.

Does ZFS subsystem closes and re''opens files in due course of usage ?
 
 
This message posted from opensolaris.org

Reasonably Related Threads

Search for more possibly parallel threads

zfs discuss - Nov 2006 - reproducible zfs panic on Solaris 10 06/06

[zfs-discuss] reproducible zfs panic on Solaris 10 06/06

[zfs-discuss] reproducible zfs panic on Solaris 10 06/06

[zfs-discuss] Re: reproducible zfs panic on Solaris 10 06/06

[zfs-discuss] Re: reproducible zfs panic on Solaris 10 06/06

[zfs-discuss] Re: reproducible zfs panic on Solaris 10 06/06

Reasonably Related Threads