Hi, Using a ZFS emulated volume, I wasn''t expecting to see a system [1] hang caused by a SCSI error. What do you think? The error is not systematic. When it happens, the Solaris/Xen dom0 console keeps displaying the following message and the system hangs. *Aug 3 11:11:23 jesma58 scsi: WARNING: /pci@0,0/pci1022,7450@a/pci17c2,10@4/sd@1,0 (sd2): Aug 3 11:11:23 jesma58 Error for Command: read(10) Error Level: Retryable Aug 3 11:11:23 jesma58 scsi: Requested Block: 67679394 Error Block: 67679394 Aug 3 11:11:23 jesma58 scsi: Vendor: SEAGATE Serial Number: 3JA7XWQY Aug 3 11:11:23 jesma58 scsi: Sense Key: Unit_Attention Aug 3 11:11:23 jesma58 scsi: ASC: 0x29 (bus device reset message occurred), ASCQ: 0x3, FRU: 0x4* Not sure whether the error happens when the ZFS emulated volume is accessed by Solaris/Xen dom0 or by the Linux/Xen domU guest OS....? Here is the usage scenario: 1 - Created a ZFS emaluated volume tank/vol2 ( *# zfs create -V 5gb tank/vol2**** * 2 - Copied an x86 boot sector disk image into that volume *# file disk.img disk.img: DOS executable (COM) # ls -l disk.img **-rw-r--r-- 1 ppetit icnc 5368709120 Aug 2 18:45 disk.img* * # dd if=disk.img of=/dev/zvol/dsk/tank/vol2 bs=8192 **# zfs get all tank/vol2 NAME PROPERTY VALUE SOURCE tank/vol2 type volume - tank/vol2 creation Wed Aug 2 18:37 2006 - tank/vol2 used 5.04G - tank/vol2 available 5.83G - tank/vol2 referenced 5.04G - tank/vol2 compressratio 1.00x - tank/vol2 reservation 5G local tank/vol2 volsize 5G - tank/vol2 volblocksize 8K - tank/vol2 checksum on default tank/vol2 compression off default tank/vol2 readonly off default** * 3 - Boot a Linux Xen domU kernel on that volume which contains an ext3fs rootfs partition and a swap partion. Thanks, Patrick ---------------------------------------- [1] SunOS 5.11 matrix-build-2006-07-14 i86xen i386 i86xen on V20Z machine. * * --- Patrick Petit Sun Microsystems Inc. Labs, CTO ICNC Grenoble (http://icncweb.france) Phone: (+33)476 188 232 x38232 180, Avenue de l''Europe Fax: (+33)476 188 282 38334 Saint-Ismier Cedex, France
Patrick Petit wrote:> Hi, > > Using a ZFS emulated volume, I wasn''t expecting to see a system [1] > hang caused by a SCSI error. What do you think? The error is not > systematic. When it happens, the Solaris/Xen dom0 console keeps > displaying the following message and the system hangs. > > *Aug 3 11:11:23 jesma58 scsi: WARNING: > /pci@0,0/pci1022,7450@a/pci17c2,10@4/sd@1,0 (sd2): > Aug 3 11:11:23 jesma58 Error for Command: > read(10) Error Level: Retryable > Aug 3 11:11:23 jesma58 scsi: Requested Block: > 67679394 Error Block: 67679394 > Aug 3 11:11:23 jesma58 scsi: Vendor: > SEAGATE Serial Number: 3JA7XWQY > Aug 3 11:11:23 jesma58 scsi: Sense Key: Unit_Attention > Aug 3 11:11:23 jesma58 scsi: ASC: 0x29 (bus device reset message > occurred), ASCQ: 0x3, FRU: 0x4*Have you looked into this futher using FMA, using fmadm to start with? Darren
Darren Reed wrote:> Patrick Petit wrote: > >> Hi, >> >> Using a ZFS emulated volume, I wasn''t expecting to see a system [1] >> hang caused by a SCSI error. What do you think? The error is not >> systematic. When it happens, the Solaris/Xen dom0 console keeps >> displaying the following message and the system hangs. >> >> *Aug 3 11:11:23 jesma58 scsi: WARNING: >> /pci@0,0/pci1022,7450@a/pci17c2,10@4/sd@1,0 (sd2): >> Aug 3 11:11:23 jesma58 Error for Command: >> read(10) Error Level: Retryable >> Aug 3 11:11:23 jesma58 scsi: Requested Block: >> 67679394 Error Block: 67679394 >> Aug 3 11:11:23 jesma58 scsi: Vendor: >> SEAGATE Serial Number: 3JA7XWQY >> Aug 3 11:11:23 jesma58 scsi: Sense Key: Unit_Attention >> Aug 3 11:11:23 jesma58 scsi: ASC: 0x29 (bus device reset message >> occurred), ASCQ: 0x3, FRU: 0x4* > > > > Have you looked into this futher using FMA, using fmadm to start with?fmadm shows no error :-( jesma58# fmadm faulty -a STATE RESOURCE / UUID -------- ---------------------------------------------------------------------- jesma58#> > Darren > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Patrick Petit Sun Microsystems Inc. Labs, CTO - G2 Systems Exp. ICNC Grenoble (http://icncweb.france) Phone: (+33)476 188 232 x38232 180, Avenue de l''Europe Fax: (+33)476 188 282 38334 Saint-Ismier Cedex, France
Patrick Petit wrote:> Darren Reed wrote: >> Patrick Petit wrote: >>> Using a ZFS emulated volume, I wasn''t expecting to see a system [1] >>> hang caused by a SCSI error. What do you think? The error is not >>> systematic. When it happens, the Solaris/Xen dom0 console keeps >>> displaying the following message and the system hangs. >>> *Aug 3 11:11:23 jesma58 scsi: WARNING: >>> /pci@0,0/pci1022,7450@a/pci17c2,10@4/sd@1,0 (sd2): >>> Aug 3 11:11:23 jesma58 Error for Command: >>> read(10) Error Level: Retryable >>> Aug 3 11:11:23 jesma58 scsi: Requested Block: >>> 67679394 Error Block: 67679394 >>> Aug 3 11:11:23 jesma58 scsi: Vendor: >>> SEAGATE Serial Number: 3JA7XWQY >>> Aug 3 11:11:23 jesma58 scsi: Sense Key: Unit_Attention >>> Aug 3 11:11:23 jesma58 scsi: ASC: 0x29 (bus device reset message >>> occurred), ASCQ: 0x3, FRU: 0x4* >> Have you looked into this futher using FMA, using fmadm to start with? > fmadm shows no error :-( > jesma58# fmadm faulty -a > STATE RESOURCE / UUID > -------- > ---------------------------------------------------------------------- > jesma58#I have had a similar issue with the solitary SATA disk which makes up my zfs root pool - errors such as these send the system to a hang state (fortunately not a hard hang) and require a break/F1-A + forced crash to get out of. As I understand it, ZFS will retry operations based on various settings such as those in ''sd'' and I don''t believe there are specific error case handlers in the ZFS code to deal with issues like this. OTOH it would be nice to see ZFS invoking an error path immediately on receipt of a failure like your or mine. But I fear that this would detract from the device agnosticism that we presently have. Patrick, is your pool mirrored? I know that mine isn''t and as a result I know that I need to expect that I will suffer. The other thing that I am concerned with in your scenario is that you are dd-ing a disk image onto a zvol. I''m not sure that this is the right way to go about it (although I don''t know what *is* the right way to do it). best regards, James C. McPherson -- Solaris Datapath Engineering Storage Division Sun Microsystems
James C. McPherson wrote:> Patrick Petit wrote: > >> Darren Reed wrote: >> >>> Patrick Petit wrote: >>> >>>> Using a ZFS emulated volume, I wasn''t expecting to see a system [1] >>>> hang caused by a SCSI error. What do you think? The error is not >>>> systematic. When it happens, the Solaris/Xen dom0 console keeps >>>> displaying the following message and the system hangs. >>>> *Aug 3 11:11:23 jesma58 scsi: WARNING: >>>> /pci@0,0/pci1022,7450@a/pci17c2,10@4/sd@1,0 (sd2): >>>> Aug 3 11:11:23 jesma58 Error for Command: >>>> read(10) Error Level: Retryable >>>> Aug 3 11:11:23 jesma58 scsi: Requested Block: >>>> 67679394 Error Block: 67679394 >>>> Aug 3 11:11:23 jesma58 scsi: Vendor: >>>> SEAGATE Serial Number: 3JA7XWQY >>>> Aug 3 11:11:23 jesma58 scsi: Sense Key: Unit_Attention >>>> Aug 3 11:11:23 jesma58 scsi: ASC: 0x29 (bus device reset message >>>> occurred), ASCQ: 0x3, FRU: 0x4* >>> >>> Have you looked into this futher using FMA, using fmadm to start with? >> >> fmadm shows no error :-( >> jesma58# fmadm faulty -a >> STATE RESOURCE / UUID >> -------- >> ---------------------------------------------------------------------- >> jesma58# > > > I have had a similar issue with the solitary SATA disk which makes > up my zfs root pool - errors such as these send the system to a hang > state (fortunately not a hard hang) and require a break/F1-A + forced > crash to get out of. > > > As I understand it, ZFS will retry operations based on various settings > such as those in ''sd'' and I don''t believe there are specific error case > handlers in the ZFS code to deal with issues like this.I am wondering to what extent this is the role of ZFS to fix SCSI controller errors. Shouldn''t it be role of the controller driver, or even the controller itself? I would expect that in such circumstances lower layers repare and/or isolate the faulty block by using, for instance, a reassignment block. But, for having written SCSI drivers in the past (to my discharge that was long time ago) I do not recall drivers were that elaborated so letting the above layers deal with the hot potatoes :-(> > OTOH it would be nice to see ZFS invoking an error path immediately > on receipt of a failure like your or mine. But I fear that this would > detract from the device agnosticism that we presently have. > > Patrick, is your pool mirrored? I know that mine isn''t and as a result > I know that I need to expect that I will suffer.No it''s not mirrored. It''s a simple pool backed by a physical disk drive.> > > The other thing that I am concerned with in your scenario is that you > are dd-ing a disk image onto a zvol. I''m not sure that this is the > right way to go about it (although I don''t know what *is* the right > way to do it). > > >Yes I am wondering the same. Would it be preferable to dd on the raw (rdsk) device?> best regards, > > James C. McPherson > -- > Solaris Datapath Engineering > Storage Division > Sun Microsystems
I have similar problems ... I have a bunch of D1000 disk shelves attached via SCSI HBAs to a V880. If I do something as simple as unplug a drive in a raidz vdev, it generates SCSI errors that eventually freeze the entire system. I can access the filesystem okay for a couple minutes until the SCSI bus resets, then I have a frozen box. I have to stop-a/sync/reset. If I offline the device before unplugging the drive, I have no problems. Yeah, sure, I know you''re supposed to offline it first, but I''m trying to test unexpected failures. If the power supplies fail on one of my shelves, the data will be intact, but the system will hang. This is good, but not great, since I really want this to be a high-availability system. I believe this is a failure of the OS, controller, or SCSI driver to isolate the bad device and let the rest of the system operate, rather than a ZFS issue. This message posted from opensolaris.org
Brad, I''m investigating a similar issue and would like to get a coredump if you have one available. Thanks, George Brad Plecs wrote:> I have similar problems ... I have a bunch of D1000 disk shelves attached via > SCSI HBAs to a V880. If I do something as simple as unplug a drive in a raidz > vdev, it generates SCSI errors that eventually freeze the entire system. I can > access the filesystem okay for a couple minutes until the SCSI bus resets, then > I have a frozen box. I have to stop-a/sync/reset. > > If I offline the device before unplugging the drive, I have no problems. > > Yeah, sure, I know you''re supposed to offline it first, but I''m trying to test > unexpected failures. If the power supplies fail on one of my shelves, the data will be intact, but the system will hang. This is good, but not great, since I really > want this to be a high-availability system. > > I believe this is a failure of the OS, controller, or SCSI driver to isolate the bad device and let the rest of the system operate, rather than a ZFS issue. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
The core dump timed out (related to the SCSI bus reset?), so I don''t have one. I can try it again, though, it''s easy enough to reproduce. I was seeing errors on the fibre channel disks as well, so it''s possible the whole thing was locked up. BP -- bplecs at cs.umd.edu
Brad, I have a suspicion about what you might be seeing and I want to confirm it. If it locks up again you can also collect a threadlist: "echo $<threadlist" | mdb -k Send me the output and that will be a good starting point. Thanks, George Brad Plecs wrote:> The core dump timed out (related to the SCSI bus reset?), so I don''t > have one. I can try it again, though, it''s easy enough to reproduce. > > I was seeing errors on the fibre channel disks as well, so it''s possible > the whole thing was locked up. > > BP >
> Brad, > > I have a suspicion about what you might be seeing and I want to confirm > it. If it locks up again you can also collect a threadlist: > > "echo $<threadlist" | mdb -k > > Send me the output and that will be a good starting point.I tried popping out a disk again, but for whatever reason, the system just became sluggish rather than freezing this time. I didn''t get to take a whole disk shelf offline like I''d done before, because this system went into production use over the weekend, but I''ll send you the threadlist from the single-disk try privately. BP -- bplecs at cs.umd.edu