Jorgen Lundman
2009-Aug-06 06:07 UTC
[zfs-discuss] x4540 dead HDD replacement, remains "configured".
x4540 snv_117 We lost a HDD last night, and it seemed to take out most of the bus or something and forced us to reboot. (We have yet to experience losing a disk that didn''t force a reboot mind you). So today, I''m looking at replacing the broken HDD, but no amount of work makes it "turn on the blue LED". After trying that for an hour, we just replaced the HDD anyway. But no amount of work will make it use/recognise it. (We tried more than one working spare HDD too). For example: # zpool status raidz1 DEGRADED 0 0 0 c5t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 spare DEGRADED 0 0 285K c1t5d0 UNAVAIL 0 0 0 cannot open c4t7d0 ONLINE 0 0 0 4.13G resilvered c2t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 spares c4t7d0 INUSE currently in use # zpool offline zpool1 c1t5d0 raidz1 DEGRADED 0 0 0 c5t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 spare DEGRADED 0 0 285K c1t5d0 OFFLINE 0 0 0 c4t7d0 ONLINE 0 0 0 4.13G resilvered c2t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 # cfgadm -al Ap_Id Type Receptacle Occupant Condition c1 scsi-bus connected configured unknown c1::dsk/c1t5d0 disk connected configured failed # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -al c1::dsk/c1t5d0 disk connected configured failed # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -fc unconfigure c1::dsk/c1t5d0 # cfgadm -fc unconfigure c1::dsk/c1t5d0 # cfgadm -al c1::dsk/c1t5d0 disk connected configured failed # hdadm offline slot 13 1: 5: 9: 13: 17: 21: 25: 29: 33: 37: 41: 45: c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5 ^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ # cfgadm -al c1::dsk/c1t5d0 disk connected configured failed # fmadm faulty FRU : "HD_ID_47" (hc://:product-id=Sun-Fire-X4540:chassis-id=0915AMR048:server-id=x4500-10.unix:serial=9QMB024K:part=SEAGATE-ST35002NSSUN500G-09107B024K:revision=SU0D/chassis=0/bay=47/disk=0) faulty # fmadm repair HD_ID_47 fmadm: recorded repair to HD_ID_47 # format | grep c1t5d0 # # hdadm offline slot 13 1: 5: 9: 13: 17: 21: 25: 29: 33: 37: 41: 45: c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5 ^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ # cfgadm -al c1::dsk/c1t5d0 disk connected configured failed # ipmitool sunoem led get|grep 13 hdd13.fail.led | ON hdd13.ok2rm.led | OFF # zpool online zpool1 c1t5d0 warning: device ''c1t5d0'' onlined, but remains in faulted state use ''zpool replace'' to replace devices that are no longer present # cfgadm -c disconnect c1::dsk/c1t5d0 cfgadm: Hardware specific failure: operation not supported for SCSI device Bah, why were they changed to SCSI? Increasing the size of the hammer... # cfgadm -x replace_device c1::sd37 Replacing SCSI device: /devices/pci at 0,0/pci10de,375 at b/pci1000,1000 at 0/sd at 5,0 This operation will suspend activity on SCSI bus: c1 Continue (yes/no)? y SCSI bus quiesced successfully. It is now safe to proceed with hotplug operation. Enter y if operation is complete or n to abort (yes/no)? y # cfgadm -al c1::dsk/c1t5d0 disk connected configured failed I am fairly certain that if I reboot, it will all come back ok again. But I would like to believe that I should be able to replace a disk without rebooting on a X4540. Any other commands I should try? Lund -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Jorgen Lundman
2009-Aug-06 06:48 UTC
[zfs-discuss] x4540 dead HDD replacement, remains "configured".
I suspect this is what it is all about: # devfsadm -v devfsadm[16283]: verbose: no devfs node or mismatched dev_t for /devices/pci at 0,0/pci10de,375 at b/pci1000,1000 at 0/sd at 5,0:a [snip] and indeed: brw-r----- 1 root sys 30, 2311 Aug 6 15:34 sd at 4,0:wd crw-r----- 1 root sys 30, 2311 Aug 6 15:24 sd at 4,0:wd,raw drwxr-xr-x 2 root sys 2 Aug 6 14:31 sd at 5,0 drwxr-xr-x 2 root sys 2 Apr 17 17:52 sd at 6,0 brw-r----- 1 root sys 30, 2432 Jul 6 09:50 sd at 6,0:a crw-r----- 1 root sys 30, 2432 Jul 6 09:48 sd at 6,0:a,raw Perhaps because it was booted with the dead disk in place, it never configured the entire "sd5" mpt driver. Why the other hard-disks work I don''t know. I suspect the only way to fix this, is to reboot again. Lund Jorgen Lundman wrote:> > x4540 snv_117 > > We lost a HDD last night, and it seemed to take out most of the bus or > something and forced us to reboot. (We have yet to experience losing a > disk that didn''t force a reboot mind you). > > So today, I''m looking at replacing the broken HDD, but no amount of work > makes it "turn on the blue LED". After trying that for an hour, we just > replaced the HDD anyway. But no amount of work will make it > use/recognise it. (We tried more than one working spare HDD too). > > For example: > > # zpool status > > raidz1 DEGRADED 0 0 0 > c5t1d0 ONLINE 0 0 0 > c0t5d0 ONLINE 0 0 0 > spare DEGRADED 0 0 285K > c1t5d0 UNAVAIL 0 0 0 cannot open > c4t7d0 ONLINE 0 0 0 4.13G resilvered > c2t5d0 ONLINE 0 0 0 > c3t5d0 ONLINE 0 0 0 > spares > c4t7d0 INUSE currently in use > > > > # zpool offline zpool1 c1t5d0 > > raidz1 DEGRADED 0 0 0 > c5t1d0 ONLINE 0 0 0 > c0t5d0 ONLINE 0 0 0 > spare DEGRADED 0 0 285K > c1t5d0 OFFLINE 0 0 0 > c4t7d0 ONLINE 0 0 0 4.13G resilvered > c2t5d0 ONLINE 0 0 0 > c3t5d0 ONLINE 0 0 0 > > > # cfgadm -al > Ap_Id Type Receptacle Occupant Condition > c1 scsi-bus connected configured unknown > c1::dsk/c1t5d0 disk connected configured > failed > > # cfgadm -c unconfigure c1::dsk/c1t5d0 > # cfgadm -al > c1::dsk/c1t5d0 disk connected configured > failed > # cfgadm -c unconfigure c1::dsk/c1t5d0 > # cfgadm -c unconfigure c1::dsk/c1t5d0 > # cfgadm -fc unconfigure c1::dsk/c1t5d0 > # cfgadm -fc unconfigure c1::dsk/c1t5d0 > # cfgadm -al > c1::dsk/c1t5d0 disk connected configured > failed > > # hdadm offline slot 13 > 1: 5: 9: 13: 17: 21: 25: 29: 33: 37: 41: 45: > c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5 > ^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ > > # cfgadm -al > c1::dsk/c1t5d0 disk connected configured > failed > > # fmadm faulty > FRU : "HD_ID_47" > (hc://:product-id=Sun-Fire-X4540:chassis-id=0915AMR048:server-id=x4500-10.unix:serial=9QMB024K:part=SEAGATE-ST35002NSSUN500G-09107B024K:revision=SU0D/chassis=0/bay=47/disk=0) > > faulty > > # fmadm repair HD_ID_47 > fmadm: recorded repair to HD_ID_47 > > # format | grep c1t5d0 > # > > # hdadm offline slot 13 > 1: 5: 9: 13: 17: 21: 25: 29: 33: 37: 41: 45: > c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5 > ^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ > > # cfgadm -al > c1::dsk/c1t5d0 disk connected configured > failed > > # ipmitool sunoem led get|grep 13 > hdd13.fail.led | ON > hdd13.ok2rm.led | OFF > > # zpool online zpool1 c1t5d0 > warning: device ''c1t5d0'' onlined, but remains in faulted state > use ''zpool replace'' to replace devices that are no longer present > > # cfgadm -c disconnect c1::dsk/c1t5d0 > cfgadm: Hardware specific failure: operation not supported for SCSI device > > > Bah, why were they changed to SCSI? Increasing the size of the hammer... > > > # cfgadm -x replace_device c1::sd37 > Replacing SCSI device: /devices/pci at 0,0/pci10de,375 at b/pci1000,1000 at 0/sd at 5,0 > This operation will suspend activity on SCSI bus: c1 > Continue (yes/no)? y > SCSI bus quiesced successfully. > It is now safe to proceed with hotplug operation. > Enter y if operation is complete or n to abort (yes/no)? y > > # cfgadm -al > c1::dsk/c1t5d0 disk connected configured > failed > > > I am fairly certain that if I reboot, it will all come back ok again. > But I would like to believe that I should be able to replace a disk > without rebooting on a X4540. > > Any other commands I should try? > > Lund >-- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Brent Jones
2009-Aug-06 07:38 UTC
[zfs-discuss] x4540 dead HDD replacement, remains "configured".
On Wed, Aug 5, 2009 at 11:48 PM, Jorgen Lundman<lundman at gmo.jp> wrote:> > I suspect this is what it is all about: > > ?# devfsadm -v > devfsadm[16283]: verbose: no devfs node or mismatched dev_t for > /devices/pci at 0,0/pci10de,375 at b/pci1000,1000 at 0/sd at 5,0:a > [snip] > > and indeed: > > brw-r----- ? 1 root ? ? sys ? ? ? 30, 2311 Aug ?6 15:34 sd at 4,0:wd > crw-r----- ? 1 root ? ? sys ? ? ? 30, 2311 Aug ?6 15:24 sd at 4,0:wd,raw > drwxr-xr-x ? 2 root ? ? sys ? ? ? ? ? ?2 Aug ?6 14:31 sd at 5,0 > drwxr-xr-x ? 2 root ? ? sys ? ? ? ? ? ?2 Apr 17 17:52 sd at 6,0 > brw-r----- ? 1 root ? ? sys ? ? ? 30, 2432 Jul ?6 09:50 sd at 6,0:a > crw-r----- ? 1 root ? ? sys ? ? ? 30, 2432 Jul ?6 09:48 sd at 6,0:a,raw > > Perhaps because it was booted with the dead disk in place, it never > configured the entire "sd5" mpt driver. Why the other hard-disks work I > don''t know. > > I suspect the only way to fix this, is to reboot again. > > Lund > >I have a pair of X4540''s also, and getting any kind of drive status, or failure alert is a lost cause. I''ve opened several cases with Sun with the following issues: ILOM/BMC can''t see any drives (status, FRU, firmware, etc) FMA cannot see a drive failure (you can pull a drive, and it could be hours before ''zpool status'' will show a failed drive, even during a ''zfs scrub'') Hot swapping drives rarely works, system will not see new drive until a reboot Things I''ve tried that Sun has suggested: New BIOS New controller firmware New ILOM firmware Upgrading to new releases of Osol (currently on 118, no luck) Replacing ILOM card Custom FMA configs Nothing works, and my cases with Sun have been open for about 6 months now, with no resolution in sight. Given that Sun now makes the 7000, I can only assume their support on the more "whitebox" version, AKA X4540, is either near an end, or they don''t intend to support any advanced monitoring whatsoever. Sad, really.. as my $900 Dell and HP servers can send SMS, Jabber messages, SNMP traps, etc, on ANY IPMI event, hardware issue, and what have you without any tinkering or excuses. -- Brent Jones brent at servuhome.net
Ross
2009-Aug-06 08:09 UTC
[zfs-discuss] x4540 dead HDD replacement, remains "configured".
Whoah! "We have yet to experience losing a disk that didn''t force a reboot" Do you have any notes on how many times this has happened Jorgen, or what steps you''ve taken each time? I appreciate you''re probably more concerned with getting an answer to your question, but if ZFS needs a reboot to cope with failures on even an x4540, that''s an absolute deal breaker for everything we want to do with ZFS. Ross -- This message posted from opensolaris.org
Jorgen Lundman
2009-Aug-06 12:47 UTC
[zfs-discuss] x4540 dead HDD replacement, remains "configured".
Well, to be fair, there were some special cases. I know we had 3 separate occasions with broken HDDs, when we were using UFS. 2 of these appeared to hang, and the 3rd only hung once we replaced the disk. This is most likely due to use using UFS in zvol (for quotas). We got an IDR patch, and eventually this was released as "UFS 3-way deadlock writing log with zvol". I forget the number right now, but the patch is out. This is the very first time we have lost a disk in a purely-ZFS system, and I was somewhat hoping that this would be the time everything went smoothly. But it did not. However, I have also experienced (once) a disk dying in such a way that it took out the chain in a netapp, so perhaps the disk died like this here to (it is really dead). But still disappointing. Power cycling the x4540 takes about 7 minutes (service to service), but with Sol svn116(?) and up it can do quiesce-reboots, which take about 57 seconds. In this case, we had to power cycle. Ross wrote:> Whoah! > > "We have yet to experience losing a > disk that didn''t force a reboot" > > Do you have any notes on how many times this has happened Jorgen, or what steps you''ve taken each time? > > I appreciate you''re probably more concerned with getting an answer to your question, but if ZFS needs a reboot to cope with failures on even an x4540, that''s an absolute deal breaker for everything we want to do with ZFS. > > Ross-- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Jorgen Lundman
2009-Aug-20 02:57 UTC
[zfs-discuss] x4540 dead HDD replacement, remains "configured".
Finally came to the reboot maintenance to reboot the x4540 to make it see the newly replaced HDD. I tried, reboot, then power-cycle, and reboot -- -r, but I can not make the x4540 accept any HDD in that bay. I''m starting to think that perhaps we did not lose the original HDD, but rather the slot, and there is a hardware problem. This is what I see after a reboot, the disk is c1t5d0, sd37, sd at 5,0 or slot 13. c1::dsk/c1t4d0 disk connected configured unknown c1::dsk/c1t5d0 disk connected configured unknown c1::dsk/c1t6d0 disk connected configured unknown # devfsadm -v devfsadm[893]: verbose: no devfs node or mismatched dev_t for /devices/pci at 0,0/pci10de,375 at b/pci1000,1000 at 0/sd at 5,0:a devfsadm[893]: verbose: symlink /dev/dsk/c1t5d0s0 -> ../../devices/pci at 0,0/pci10de,375 at b/pci1000,1000 at 0/sd at 5,0:a devfsadm[893]: verbose: no devfs node or mismatched dev_t for /devices/pci at 0,0/pci10de,375 at b/pci1000,1000 at 0/sd at 5,0:b devfsadm[893]: verbose: symlink /dev/dsk/c1t5d0s1 -> ../../devices/pci at 0,0/pci10de,375 at b/pci1000,1000 at 0/sd at 5,0:b [snip] Only messages in dmesg are: Aug 20 02:23:05 x4500-10.unix rootnex: [ID 349649 kern.info] xsvc1 at root: space 0 offset 0 Aug 20 02:23:05 x4500-10.unix genunix: [ID 936769 kern.info] xsvc1 is /xsvc at 0,0 Aug 20 02:23:09 x4500-10.unix scsi: [ID 583861 kern.info] sd37 at mpt1: target 5 lun 0 Aug 20 02:23:09 x4500-10.unix genunix: [ID 936769 kern.info] sd37 is /pci at 0,0/pci10de,375 at b/pci1000,1000 at 0/sd at 5,0 Aug 20 02:23:09 x4500-10.unix pseudo: [ID 129642 kern.info] pseudo-device: devinfo0 Aug 20 02:23:09 x4500-10.unix genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo at 0 root at x4500-10.unix # Aug 20 02:23:12 x400-10.unix genunix: WARNING: constraints forbid retire: /pci at 3c,0/pci10de,376 at f/pci1000,1000 at 0/sd at 7,0 # cd ../../devices/pci at 0,0/pci10de,375 at b/pci1000,1000 at 0/ root at x4500-10.unix # ls -l ./sd at 5,0:a: No such device or address ./sd at 5,0:a,raw: No such device or address ./sd at 5,0:b: No such device or address ./sd at 5,0:b,raw: No such device or address ./sd at 5,0:c: No such device or address [snip lots, these errors only show up the first time you ls] total 24 drwxr-xr-x 2 root sys 2 Apr 17 17:52 sd at 0,0 brw-r----- 1 root sys 30, 2048 Jul 6 09:34 sd at 0,0:a crw-r----- 1 root sys 30, 2048 Jul 6 09:34 sd at 0,0:a,raw brw-r----- 1 root sys 30, 2049 Jul 6 09:34 sd at 0,0:b crw-r----- 1 root sys 30, 2049 Jul 6 09:34 sd at 0,0:b,raw [snip] crw-r----- 1 root sys 30, 2067 Jul 6 09:44 sd at 0,0:t,raw brw-r----- 1 root sys 30, 2068 Jul 6 09:50 sd at 0,0:u crw-r----- 1 root sys 30, 2068 Jul 6 09:44 sd at 0,0:u,raw drwxr-xr-x 2 root sys 2 Apr 17 17:52 sd at 1,0 brw-r----- 1 root sys 30, 2112 Jul 6 09:50 sd at 1,0:a crw-r----- 1 root sys 30, 2112 Jul 6 09:48 sd at 1,0:a,raw brw-r----- 1 root sys 30, 2113 Jul 6 09:50 sd at 1,0:b [snip] brw-r----- 1 root sys 30, 2132 Jul 6 09:50 sd at 1,0:u crw-r----- 1 root sys 30, 2132 Jul 6 09:48 sd at 1,0:u,raw brw-r----- 1 root sys 30, 2119 Aug 20 02:23 sd at 1,0:wd crw-r----- 1 root sys 30, 2119 Aug 20 02:23 sd at 1,0:wd,raw drwxr-xr-x 2 root sys 2 Apr 17 17:52 sd at 2,0 brw-r----- 1 root sys 30, 2176 Jul 6 09:50 sd at 2,0:a crw-r----- 1 root sys 30, 2176 Jul 6 09:48 sd at 2,0:a,raw brw-r----- 1 root sys 30, 2177 Jul 6 09:50 sd at 2,0:b [snip] brw-r----- 1 root sys 30, 2196 Jul 6 09:50 sd at 2,0:u crw-r----- 1 root sys 30, 2196 Jul 6 09:48 sd at 2,0:u,raw brw-r----- 1 root sys 30, 2183 Aug 20 02:23 sd at 2,0:wd crw-r----- 1 root sys 30, 2183 Aug 20 02:23 sd at 2,0:wd,raw drwxr-xr-x 2 root sys 2 Apr 17 17:52 sd at 3,0 brw-r----- 1 root sys 30, 2240 Jul 2 15:30 sd at 3,0:a crw-r----- 1 root sys 30, 2240 Jul 6 09:48 sd at 3,0:a,raw brw-r----- 1 root sys 30, 2241 Jul 6 09:50 sd at 3,0:b [snip] brw-r----- 1 root sys 30, 2260 Jul 6 09:50 sd at 3,0:u crw-r----- 1 root sys 30, 2260 Jul 6 09:48 sd at 3,0:u,raw brw-r----- 1 root sys 30, 2247 Jul 6 09:50 sd at 3,0:wd crw-r----- 1 root sys 30, 2247 Jul 6 09:43 sd at 3,0:wd,raw drwxr-xr-x 2 root sys 2 Apr 17 17:52 sd at 4,0 brw-r----- 1 root sys 30, 2304 Jul 6 09:50 sd at 4,0:a crw-r----- 1 root sys 30, 2304 Jul 6 09:48 sd at 4,0:a,raw brw-r----- 1 root sys 30, 2305 Jul 6 09:50 sd at 4,0:b [snip] brw-r----- 1 root sys 30, 2324 Jul 6 09:50 sd at 4,0:u crw-r----- 1 root sys 30, 2324 Jul 6 09:48 sd at 4,0:u,raw brw-r----- 1 root sys 30, 2311 Aug 20 02:23 sd at 4,0:wd crw-r----- 1 root sys 30, 2311 Aug 20 02:23 sd at 4,0:wd,raw drwxr-xr-x 2 root sys 2 Aug 6 14:31 sd at 5,0 drwxr-xr-x 2 root sys 2 Apr 17 17:52 sd at 6,0 brw-r----- 1 root sys 30, 2432 Jul 6 09:50 sd at 6,0:a crw-r----- 1 root sys 30, 2432 Jul 6 09:48 sd at 6,0:a,raw brw-r----- 1 root sys 30, 2433 Jul 6 09:50 sd at 6,0:b crw-r----- 1 root sys 30, 2433 Jul 6 09:48 sd at 6,0:b,raw brw-r----- 1 root sys 30, 2434 Jul 6 09:50 sd at 6,0:c [snip] brw-r----- 1 root sys 30, 2452 Jul 6 09:50 sd at 6,0:u crw-r----- 1 root sys 30, 2452 Jul 6 09:48 sd at 6,0:u,raw brw-r----- 1 root sys 30, 2439 Aug 20 02:24 sd at 6,0:wd crw-r----- 1 root sys 30, 2439 Aug 20 02:23 sd at 6,0:wd,raw drwxr-xr-x 2 root sys 2 Apr 17 17:52 sd at 7,0 brw-r----- 1 root sys 30, 2496 Jul 2 15:30 sd at 7,0:a crw-r----- 1 root sys 30, 2496 Jul 6 09:48 sd at 7,0:a,raw brw-r----- 1 root sys 30, 2497 Jul 6 09:50 sd at 7,0:b crw-r----- 1 root sys 30, 2497 Jul 6 09:48 sd at 7,0:b,raw brw-r----- 1 root sys 30, 2498 Jul 6 09:50 sd at 7,0:c crw-r----- 1 root sys 30, 2498 Jul 6 09:43 sd at 7,0:c,raw brw-r----- 1 root sys 30, 2499 Jul 6 09:50 sd at 7,0:d crw-r----- 1 root sys 30, 2499 Jul 6 09:48 sd at 7,0:d,raw brw-r----- 1 root sys 30, 2500 Jul 6 09:50 sd at 7,0:e crw-r----- 1 root sys 30, 2500 Jul 6 09:48 sd at 7,0:e,raw So it seems sd at 5,0 is empty, it is peculiar that all other HDDs on c1tX works though. Eventually I notice that cfgadm goes to: c1::dsk/c1t4d0 disk connected configured unknown c1::dsk/c1t5d0 disk connected configured failed c1::dsk/c1t6d0 disk connected configured unknown We promoted the Spare in use to replace c1t5d0, so now the pool looks like: pool: zpool1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zpool1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 [was c1t5d0] c2t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Ian Collins
2009-Aug-21 08:29 UTC
[zfs-discuss] x4540 dead HDD replacement, remains "configured".
Jorgen Lundman wrote:> > Finally came to the reboot maintenance to reboot the x4540 to make it > see the newly replaced HDD. > > I tried, reboot, then power-cycle, and reboot -- -r, > > but I can not make the x4540 accept any HDD in that bay. I''m starting > to think that perhaps we did not lose the original HDD, but rather the > slot, and there is a hardware problem. > > This is what I see after a reboot, the disk is c1t5d0, sd37, sd at 5,0 or > slot 13. > > c1::dsk/c1t4d0 disk connected configured > unknown > c1::dsk/c1t5d0 disk connected configured > unknown > c1::dsk/c1t6d0 disk connected configured > unknown >Does format show it? -- Ian.
Jorgen Lundman
2009-Aug-21 12:42 UTC
[zfs-discuss] x4540 dead HDD replacement, remains "configured".
Nope, that it does not. Ian Collins wrote:> Jorgen Lundman wrote: >> >> Finally came to the reboot maintenance to reboot the x4540 to make it >> see the newly replaced HDD. >> >> I tried, reboot, then power-cycle, and reboot -- -r, >> >> but I can not make the x4540 accept any HDD in that bay. I''m starting >> to think that perhaps we did not lose the original HDD, but rather the >> slot, and there is a hardware problem. >> >> This is what I see after a reboot, the disk is c1t5d0, sd37, sd at 5,0 or >> slot 13. >> >> c1::dsk/c1t4d0 disk connected configured >> unknown >> c1::dsk/c1t5d0 disk connected configured >> unknown >> c1::dsk/c1t6d0 disk connected configured >> unknown >> > Does format show it? >-- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Ian Collins
2009-Aug-21 20:37 UTC
[zfs-discuss] x4540 dead HDD replacement, remains "configured".
Jorgen Lundman wrote:> Ian Collins wrote: >> Jorgen Lundman wrote: >>> >>> Finally came to the reboot maintenance to reboot the x4540 to make >>> it see the newly replaced HDD. >>> >>> I tried, reboot, then power-cycle, and reboot -- -r, >>> >>> but I can not make the x4540 accept any HDD in that bay. I''m >>> starting to think that perhaps we did not lose the original HDD, but >>> rather the slot, and there is a hardware problem. >>> >>> This is what I see after a reboot, the disk is c1t5d0, sd37, sd at 5,0 >>> or slot 13. >>> >>> c1::dsk/c1t4d0 disk connected configured >>> unknown >>> c1::dsk/c1t5d0 disk connected configured >>> unknown >>> c1::dsk/c1t6d0 disk connected configured >>> unknown >>> >> Does format show it? >> > Nope, that it does not. >Time to call the repair man! -- Ian.
John Ryan
2009-Sep-18 12:15 UTC
[zfs-discuss] x4540 dead HDD replacement, remains "configured".
I have exactly these symptoms on 3 thumpers now. 2 x x4540s and 1 x x4500 Rebooting/Power cycling doesn''t even bring them back. The only thing I found, is that if I boot from the osol.2009.06 Cd, I can see all the drives I had to reinstall the OS on one box. I''ve only just recently upgraded them to snv_122. Before that, I could change disks without problems. Could it be something introduced since snv_111? John -- This message posted from opensolaris.org