SUNW-MSG-ID: ZFS-8000-CS, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Tue Apr 17 12:25:49 PDT 2007 PLATFORM: SUNW,Sun-Fire-880, CSN: -, HOSTNAME: twinkie SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: ce624168-b522-e35b-d4e8-a8e4b9169ad1 DESC: A ZFS pool failed to open. Refer to http://sun.com/msg/ZFS-8000-CS for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: The pool data is unavailable REC-ACTION: Run ''zpool status -x'' and either attach the missing device or restore from backup. twinkie># zpool status -x all pools are healthy twinkie># zpool status -v pool: tank state: FAULTED scrub: none requested config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas raidz2 UNAVAIL 0 0 0 corrupted data c1t1d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 0 0 c3t0d0s0 ONLINE 0 0 0 c3t0d0s1 ONLINE 0 0 0 c3t1d0s0 ONLINE 0 0 0 c3t1d0s1 ONLINE 0 0 0 c3t2d0s0 ONLINE 0 0 0 c3t2d0s1 ONLINE 0 0 0 c3t3d0s0 ONLINE 0 0 0 c3t3d0s1 ONLINE 0 0 0 c3t4d0s0 ONLINE 0 0 0 c3t4d0s1 ONLINE 0 0 0 c3t5d0s0 ONLINE 0 0 0 c3t5d0s1 ONLINE 0 0 0 c3t6d0s0 ONLINE 0 0 0 c3t6d0s1 ONLINE 0 0 0 c3t7d0s0 ONLINE 0 0 0 c3t7d0s1 ONLINE 0 0 0 c3t16d0s0 ONLINE 0 0 0 c3t16d0s1 ONLINE 0 0 0 c3t17d0s0 ONLINE 0 0 0 c3t17d0s1 ONLINE 0 0 0 c3t18d0s0 ONLINE 0 0 0 c3t18d0s1 ONLINE 0 0 0 c3t19d0s0 ONLINE 0 0 0 c3t19d0s1 ONLINE 0 0 0 c3t20d0s0 ONLINE 0 0 0 c3t20d0s1 ONLINE 0 0 0 c3t21d0s0 ONLINE 0 0 0 c3t21d0s1 ONLINE 0 0 0 c3t22d0s0 ONLINE 0 0 0 c3t22d0s1 ONLINE 0 0 0 c3t23d0s0 ONLINE 0 0 0 c3t23d0s1 ONLINE 0 0 0 -->>I ''m current on all patches as of yesterday: SunOS twinkie 5.10 Generic_125100-04 sun4u sparc SUNW,Sun-Fire-880 -->>I''ll put the rest in an attachment as this will be a long post. This message posted from opensolaris.org -------------- next part -------------- -->>This problem was with a V880 with attached storage, an A5200. The 880 is configured with 2 disk bays and the A5200 is connected via fiber to 2nd channel on the the pci-x dual host bus adapter. -->>I have to admit I probably, unknowingly at the time, caused my own problem, however I''ve now seen the zfs status out of sync on 2 different occasions. 1st time:>---------------------------------<SUNW-MSG-ID: ZFS-8000-CS, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Mon Apr 16 15:36:14 PDT 2007 PLATFORM: SUNW,Sun-Fire-880, CSN: -, HOSTNAME: twinkie SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 9448ced6-4dea-c3ba-e13a-f9028f14b328 DESC: A ZFS pool failed to open. Refer to http://sun.com/msg/ZFS-8000-CS for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: The pool data is unavailable REC-ACTION: Run ''zpool status -x'' and either attach the missing device or restore from backup. twinkie console login: root [twinkie] # bash twinkie># zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 0 0 c3t0d0s0 ONLINE 0 0 0 c3t0d0s1 ONLINE 0 0 0 c3t1d0s0 ONLINE 0 0 0 c3t1d0s1 ONLINE 0 0 0 c3t2d0s0 ONLINE 0 0 0 c3t2d0s1 ONLINE 0 0 0 c3t3d0s0 ONLINE 0 0 0 c3t3d0s1 ONLINE 0 0 0 c3t4d0s0 ONLINE 0 0 0 c3t4d0s1 ONLINE 0 0 0 c3t5d0s0 ONLINE 0 0 0 c3t5d0s1 ONLINE 0 0 0 c3t6d0s0 ONLINE 0 0 0 c3t6d0s1 ONLINE 0 0 0 c3t7d0s0 ONLINE 0 0 0 c3t7d0s1 ONLINE 0 0 0 c3t16d0s0 ONLINE 0 0 0 c3t16d0s1 ONLINE 0 0 0 c3t17d0s0 ONLINE 0 0 0 c3t17d0s1 ONLINE 0 0 0 c3t18d0s0 ONLINE 0 0 0 c3t18d0s1 ONLINE 0 0 0 c3t19d0s0 ONLINE 0 0 0 c3t19d0s1 ONLINE 0 0 0 c3t20d0s0 ONLINE 0 0 0 c3t20d0s1 ONLINE 0 0 0 c3t21d0s0 ONLINE 0 0 0 c3t21d0s1 ONLINE 0 0 0 c3t22d0s0 ONLINE 0 0 0 c3t22d0s1 ONLINE 0 0 0 c3t23d0s0 ONLINE 0 0 0 c3t23d0s1 ONLINE 0 0 0 errors: No known data errors twinkie># zpool status -x all pools are healthy -->>I checked Sunsolve for any reason for this and saw someone with a similar issue using 10_11-06 and at kernel rev -36 so I patched the system to current and didn''t see the error again. 2nd time ------------------------- -->>Then looking for a reason/solution to wait time with Oracle I upgraded firmware on my disks. After applying the firmware and rebooting I got an error on the console: WARNING: /pci at 9,600000/pci at 1/SUNW,qlc at 4/fp at 0,0/ssd at w21000004cf726fc1,0 (ssd5): Corrupt label; wrong magic number WARNING: /pci at 9,600000/pci at 1/SUNW,qlc at 4/fp at 0,0/ssd at w21000004cf727925,0 (ssd3): Corrupt label; wrong magic number ->>Upon logging in I ran format twinkie># format Searching for disks...done c2t9d0: configured with capacity of 33.92GB c2t13d0: configured with capacity of 33.92GB -->> I used format to label the disks and then ran zpool status. twinkie># zpool status pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c1t1d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c2t9d0 UNAVAIL 0 0 0 cannot open c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 UNAVAIL 0 0 0 cannot open c3t0d0s0 ONLINE 0 0 0 c3t0d0s1 ONLINE 0 0 0 c3t1d0s0 ONLINE 0 0 0 c3t1d0s1 ONLINE 0 0 0 c3t2d0s0 ONLINE 0 0 0 c3t2d0s1 ONLINE 0 0 0 c3t3d0s0 ONLINE 0 0 0 c3t3d0s1 ONLINE 0 0 0 c3t4d0s0 ONLINE 0 0 0 c3t4d0s1 ONLINE 0 0 0 c3t5d0s0 ONLINE 0 0 0 c3t5d0s1 ONLINE 0 0 0 c3t6d0s0 ONLINE 0 0 0 c3t6d0s1 ONLINE 0 0 0 c3t7d0s0 ONLINE 0 0 0 c3t7d0s1 ONLINE 0 0 0 c3t16d0s0 ONLINE 0 0 0 c3t16d0s1 ONLINE 0 0 0 c3t17d0s0 ONLINE 0 0 0 c3t17d0s1 ONLINE 0 0 0 c3t18d0s0 ONLINE 0 0 0 c3t18d0s1 ONLINE 0 0 0 c3t19d0s0 ONLINE 0 0 0 c3t19d0s1 ONLINE 0 0 0 c3t20d0s0 ONLINE 0 0 0 c3t20d0s1 ONLINE 0 0 0 c3t21d0s0 ONLINE 0 0 0 c3t21d0s1 ONLINE 0 0 0 c3t22d0s0 ONLINE 0 0 0 c3t22d0s1 ONLINE 0 0 0 c3t23d0s0 ONLINE 0 0 0 c3t23d0s1 ONLINE 0 0 0 errors: No known data errors -->>So I attempted to online one of the disks twinkie># zpool online tank c2t9d0 Bringing device c2t9d0 online twinkie># zpool status pool: tank state: FAULTED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed with 19 errors on Tue Apr 17 12:08:25 2007 config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas raidz2 UNAVAIL 0 0 0 corrupted data c1t1d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 32 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 24 0 c3t0d0s0 ONLINE 0 0 0 c3t0d0s1 ONLINE 0 0 0 c3t1d0s0 ONLINE 0 0 0 c3t1d0s1 ONLINE 0 0 0 c3t2d0s0 ONLINE 0 0 0 c3t2d0s1 ONLINE 0 0 0 c3t3d0s0 ONLINE 0 0 0 c3t3d0s1 ONLINE 0 0 0 c3t4d0s0 ONLINE 0 0 0 c3t4d0s1 ONLINE 0 0 0 c3t5d0s0 ONLINE 0 0 0 c3t5d0s1 ONLINE 0 0 0 c3t6d0s0 ONLINE 0 0 0 c3t6d0s1 ONLINE 0 0 0 c3t7d0s0 ONLINE 0 0 0 c3t7d0s1 ONLINE 0 0 0 c3t16d0s0 ONLINE 0 0 0 c3t16d0s1 ONLINE 0 0 0 c3t17d0s0 ONLINE 0 0 0 c3t17d0s1 ONLINE 0 0 0 c3t18d0s0 ONLINE 0 0 0 c3t18d0s1 ONLINE 0 0 0 c3t19d0s0 ONLINE 0 0 0 c3t19d0s1 ONLINE 0 0 0 c3t20d0s0 ONLINE 0 0 0 c3t20d0s1 ONLINE 0 0 0 c3t21d0s0 ONLINE 0 0 0 c3t21d0s1 ONLINE 0 0 0 c3t22d0s0 ONLINE 0 0 0 c3t22d0s1 ONLINE 0 0 0 c3t23d0s0 ONLINE 0 0 0 c3t23d0s1 ONLINE 0 0 0 -->>This is when the system panicked and rebooted. SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Tue Apr 17 12:08:25 PDT 2007 PLATFORM: SUNW,Sun-Fire-880, CSN: -, HOSTNAME: twinkie SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: a96cb915-8e49-65b9-d575-b0c8ba271891 DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run ''zpool status -x'' and replace the bad device. panic[cpu1]/thread=2a100e4dcc0: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 394 000002a100e4d560 genunix:assfail+74 (7b64e330, 7b64e380, 18a, 183d800, 11ed400, 0) %l0-3: 0000000000000000 000000000000000f 000000000000000a 0000000000000000 %l4-7: 00000000011ed400 0000000000000000 000000000186fc00 0000000000000000 000002a100e4d610 zfs:zfsctl_ops_root+b1a9fb0 (300043021a8, f, 11a450, 10, 30009089000, 30007f37640) %l0-3: 0000000000000001 000000000000000f 0000000000000007 0000000000000502 %l4-7: 0000030000074300 0000000000000000 0000000000000501 0000000000000000 000002a100e4d6e0 zfs:space_map_sync+278 (300036366f8, 3, 300036364a0, 10, 2, 48) %l0-3: 0000000000000010 0000030009089000 0000030009089010 0000030009089048 %l4-7: 00007fffffffffff 0000000000007fff 0000000000000006 0000000000000010 000002a100e4d7d0 zfs:metaslab_sync+200 (30003636480, 805de, 8, 30007f37640, 30004583040, 30003774dc0) %l0-3: 00000300043021a8 00000300036364b8 00000300036364a0 0000030003636578 %l4-7: 00000300036366f8 0000030003636698 00000300036367b8 0000000000000006 000002a100e4d890 zfs:vdev_sync+90 (30004583040, 805de, 805dd, 30003636480, 30004583288, d) %l0-3: 00000000018a7550 0000000000000007 0000030003774ea8 0000000000000002 %l4-7: 0000030004583040 0000030003774dc0 0000000000000000 0000000000000000 000002a100e4d940 zfs:spa_sync+1d0 (30003774dc0, 805de, 1, 0, 2a100e4dcc4, 1) %l0-3: 0000030003774f80 0000030003774f90 0000030003774ea8 000003000851b880 %l4-7: 0000000000000000 0000030004582b00 00000300043c6740 0000030003774f40 000002a100e4da00 zfs:txg_sync_thread+134 (300043c6740, 805de, 0, 2a100e4dab0, 300043c6850, 300043c6852) %l0-3: 00000300043c6860 00000300043c6810 0000000000000000 00000300043c6818 %l4-7: 00000300043c6856 00000300043c6854 00000300043c6808 00000000000805de -->> Upon reboot the status -x and status -v didn''t sync up. twinkie># zpool status -v pool: tank state: FAULTED scrub: none requested config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas raidz2 UNAVAIL 0 0 0 corrupted data c1t1d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 0 0 c3t0d0s0 ONLINE 0 0 0 c3t0d0s1 ONLINE 0 0 0 c3t1d0s0 ONLINE 0 0 0 c3t1d0s1 ONLINE 0 0 0 c3t2d0s0 ONLINE 0 0 0 c3t2d0s1 ONLINE 0 0 0 c3t3d0s0 ONLINE 0 0 0 c3t3d0s1 ONLINE 0 0 0 c3t4d0s0 ONLINE 0 0 0 c3t4d0s1 ONLINE 0 0 0 c3t5d0s0 ONLINE 0 0 0 c3t5d0s1 ONLINE 0 0 0 c3t6d0s0 ONLINE 0 0 0 c3t6d0s1 ONLINE 0 0 0 c3t7d0s0 ONLINE 0 0 0 c3t7d0s1 ONLINE 0 0 0 c3t16d0s0 ONLINE 0 0 0 c3t16d0s1 ONLINE 0 0 0 c3t17d0s0 ONLINE 0 0 0 c3t17d0s1 ONLINE 0 0 0 c3t18d0s0 ONLINE 0 0 0 c3t18d0s1 ONLINE 0 0 0 c3t19d0s0 ONLINE 0 0 0 c3t19d0s1 ONLINE 0 0 0 c3t20d0s0 ONLINE 0 0 0 c3t20d0s1 ONLINE 0 0 0 c3t21d0s0 ONLINE 0 0 0 c3t21d0s1 ONLINE 0 0 0 c3t22d0s0 ONLINE 0 0 0 c3t22d0s1 ONLINE 0 0 0 c3t23d0s0 ONLINE 0 0 0 c3t23d0s1 ONLINE 0 0 0 twinkie># zpool status -x all pools are healthy -->>I tried clearing the faults in fmadm and rebooting and had the same results. I was unable to bring the pool back online using any zpool commands. -->>With the help of a friend, thanks Rick, we were able to determine that the 2 disks I had labeled were labeled with SMI instead of EFI. So after repartitioning the 2 disks and having it labeled with EFI I got the proper label attached. Once that was done I cleared the fmadm faults and rebooted. Thank goodness the 1.3TB pool came back online. -->>I''ll be adding spares and redoing this, but for now I don''t have to recreate the entire thing.