Jason Williams
2006-Jun-14 08:23 UTC
[zfs-code] ZFS triggers kernel panic on T2000 (SXb41)
Setup: -T2000 running Solaris Express Build 41. -Qlogic 2342 HBA (using both ports multipathed via MPXIO). -StorageTek FLX210 (Engenio 2882) FC array sliced into two 6 disk RAID-1 volumes (multipathed via MPXIO). -Brocade SilkWorm 3850 running FabricOS 4.2.0. -Created a striped ZFS zpool containing two the two RAID-1 volumes as its two members. Repro: -Fail the first device (RAID-1 volume) in the zpool. -Wait 30secs-1 minute. -Fail the second device (RAID-1 volume) in the zpool. Result: -Immediate kernel panic & kernel dump. -The ALOM on the T2000 registered this error code: SC Alert: Host detected fault, MSGID: ZFS-8000-CS Should I post this as a bug? Or is it known already? -- This messages posted from opensolaris.org
James C. McPherson
2006-Jun-14 08:41 UTC
[zfs-discuss] Re: [zfs-code] ZFS triggers kernel panic on T2000 (SXb41)
Jason Williams wrote:> Setup: > -T2000 running Solaris Express Build 41. > -Qlogic 2342 HBA (using both ports multipathed via MPXIO). > -StorageTek FLX210 (Engenio 2882) FC array sliced into two 6 disk RAID-1 volumes (multipathed via MPXIO). > -Brocade SilkWorm 3850 running FabricOS 4.2.0. > -Created a striped ZFS zpool containing two the two RAID-1 volumes as its two members. > > Repro: > -Fail the first device (RAID-1 volume) in the zpool. > -Wait 30secs-1 minute. > -Fail the second device (RAID-1 volume) in the zpool. > > Result: > -Immediate kernel panic & kernel dump. > -The ALOM on the T2000 registered this error code: > SC Alert: Host detected fault, MSGID: ZFS-8000-CS > > Should I post this as a bug? Or is it known already?Jason, if you could show us the panic message and stack that would be a good start. # mdb -k [panic number, eg 0] ::status ::msgbuf *panic_thread::findstack -v cheers, James C. McPherson -- Solaris Datapath Engineering Data Management Group Sun Microsystems
James C. McPherson
2006-Jun-14 20:36 UTC
[zfs-discuss] Re: [zfs-code] ZFS triggers kernel panic on T2000 (SXb41)
Jason Williams wrote:> Hi James, > > Thanks for the quick response! Please find the requested info in the > attached log file. Also, thank you for giving the commands you needed > run. Not very adept at Solaris debugging yet. :-) If you need anything > else please let me know. > -----Original Message----- From: James C. McPherson > [mailto:James.McPherson at Sun.COM] Sent: Wed 6/14/2006 2:41 AM To: Jason > Williams Cc: zfs-code at opensolaris.org; ZFS Discussions Subject: Re: > [zfs-code] ZFS triggers kernel panic on T2000 (SXb41) > > Jason Williams wrote: >> Setup: -T2000 running Solaris Express Build 41. -Qlogic 2342 HBA (using >> both ports multipathed via MPXIO). -StorageTek FLX210 (Engenio 2882) FC >> array sliced into two 6 disk RAID-1 volumes (multipathed via MPXIO). >> -Brocade SilkWorm 3850 running FabricOS 4.2.0. -Created a striped ZFS >> zpool containing two the two RAID-1 volumes as its two members.Hi Jason, There are a number of issues which are logged in sunsolve about the panic message you saw ZFS: I/O failure (write on <unknown> off 0: zio 6001331cf80 [L0 unallocated] 4000L/600P DVA[0]=<1:2c600:600> DVA[1]=<0:2c400:600> fletcher4 lzjb BE ontiguous birth=91 fill=0 cksum=71144c5a86:60016fa and they''re all to do with hardware failures of one sort or another. What concerns me though is that from your description of your pool: "zpool containing two the two RAID-1 volumes as its two members." it appears that you don''t have any mirroring setup. If that is the case, then the behaviour you noticed is exactly what should happen because you have no protection in your pool against the removal or failure (by whatever reason) of a device from that pool. Could you please clarify whether that is in fact the case with your pool config? The output of "zpool status -v" will help here. best regards, James C. McPherson -- Solaris Datapath Engineering Data Management Group Sun Microsystems
Robert Milkowski
2006-Jun-21 14:23 UTC
[zfs-code] Re: ZFS triggers kernel panic on T2000 (SXb41)
You failed two disks in the same raid-1 group? If it so that unfortunately it is "expected" from ZFS to panic system. -- This messages posted from opensolaris.org
Jason Williams
2006-Jun-21 22:39 UTC
[zfs-code] Re: ZFS triggers kernel panic on T2000 (SXb41)
Yeah...the ZFS team has since informed me of that... :-/ Thank you for the reply though! It seems to me to be a highly undesired behavior when you''re running multiple zpools and multiple containers on top of them. As long as you''ve got good pools running (and services on top of them), you probably don''t want the system panicking. But that''s just my two cents. -- This messages posted from opensolaris.org
Robert Milkowski
2006-Jun-28 14:47 UTC
[zfs-code] Re: ZFS triggers kernel panic on T2000 (SXb41)
I agree with you and it was discussed on zfs-discuss before. In some our enviroments I really hate that behaviour - fortunatelly it didn''t happen (yet). According to Eric it''s hard to implement right now but could be possible in a future. -- This messages posted from opensolaris.org