This didn''t occur on a production server, but I thought I''d
post this anyway because it might be interesting.
I''m currently testing a ZFS NAS machine consisting of a Dell R710 with
two Dell 5/E SAS HBAs. Right now I''m in the middle of torture testing
the system, simulating drive failures, exporting the storage pool, rearranging
the disks in different slots, and what have you. Up until now, everything has
been going swimmingly.
Here was my original zpool configuration:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
c1t7d0 ONLINE 0 0 0
c1t8d0 ONLINE 0 0 0
c1t9d0 ONLINE 0 0 0
c1t10d0 ONLINE 0 0 0
c1t11d0 ONLINE 0 0 0
c1t12d0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c2t25d0 ONLINE 0 0 0
c2t26d0 ONLINE 0 0 0
c2t27d0 ONLINE 0 0 0
c2t28d0 ONLINE 0 0 0
c2t29d0 ONLINE 0 0 0
c2t30d0 ONLINE 0 0 0
c2t31d0 ONLINE 0 0 0
c2t32d0 ONLINE 0 0 0
c2t33d0 ONLINE 0 0 0
c2t34d0 ONLINE 0 0 0
c2t35d0 ONLINE 0 0 0
c2t36d0 ONLINE 0 0 0
I exported the tank zpool, and rearranged drives in the chassis and reimported
it - I ended up with this:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c2t31d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c1t12d0 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
c1t7d0 ONLINE 0 0 0
c1t8d0 ONLINE 0 0 0
c1t9d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c1t11d0 ONLINE 0 0 0
c2t25d0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c2t26d0 ONLINE 0 0 0
c2t27d0 ONLINE 0 0 0
c2t28d0 ONLINE 0 0 0
c2t29d0 ONLINE 0 0 0
c2t30d0 ONLINE 0 0 0
c1t10d0 ONLINE 0 0 0
c2t32d0 ONLINE 0 0 0
c2t33d0 ONLINE 0 0 0
c2t34d0 ONLINE 0 0 0
c2t35d0 ONLINE 0 0 0
c2t48d0 ONLINE 0 0 0
Great. No problems.
Next, I took c2t48d0 offline and then unconfigured it with cfgadm.
# zpool offline tank c2t48d0
# cfgadm -c unconfigure c2::dsk/c2t48d0
I checked the status next.
# zpool status tank
pool: tank
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in
a
degraded state.
action: Online the device using ''zpool online'' or replace the
device with
''zpool replace''.
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz2 ONLINE 0 0 0
c2t31d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c1t12d0 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
c1t7d0 ONLINE 0 0 0
c1t8d0 ONLINE 0 0 0
c1t9d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c1t11d0 ONLINE 0 0 0
c2t25d0 ONLINE 0 0 0
raidz2 DEGRADED 0 0 0
c1t4d0 ONLINE 0 0 0
c2t26d0 ONLINE 0 0 0
c2t27d0 ONLINE 0 0 0
c2t28d0 ONLINE 0 0 0
c2t29d0 ONLINE 0 0 0
c2t30d0 ONLINE 0 0 0
c1t10d0 ONLINE 0 0 0
c2t32d0 ONLINE 0 0 0
c2t33d0 ONLINE 0 0 0
c2t34d0 ONLINE 0 0 0
c2t35d0 ONLINE 0 0 0
c2t48d0 OFFLINE 0 0 0
I went back and reconfigured the drive in cfgadm.
# cfgadm -c configure c2::dsk/c2t48d0
I was surprised at this point because I didn''t have to run zpool
replace. As soon as I reconfigured the drive in cfgadm, ZFS re-silvered the
zpool without any action from me.
# zpool status tank
pool: tank
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Tue Nov 10 15:33:08 2009
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c2t31d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c1t12d0 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
c1t7d0 ONLINE 0 0 0
c1t8d0 ONLINE 0 0 0
c1t9d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c1t11d0 ONLINE 0 0 0
c2t25d0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c2t26d0 ONLINE 0 0 0
c2t27d0 ONLINE 0 0 0
c2t28d0 ONLINE 0 0 0
c2t29d0 ONLINE 0 0 0
c2t30d0 ONLINE 0 0 0
c1t10d0 ONLINE 0 0 0
c2t32d0 ONLINE 0 0 0
c2t33d0 ONLINE 0 0 0
c2t34d0 ONLINE 0 0 0
c2t35d0 ONLINE 0 0 0
c2t48d0 ONLINE 0 0 0 3K resilvered
I wanted to destroy this zpool and reconfigure it differently - but when I tried
I got this error:
# zpool destroy tank
cannot unmount ''/tank'': Device busy
could not destroy ''tank'': could not unmount datasets
Hmm. That''s interesting. Let''s reboot the system and see what
happens. Upon reboot, this is what tank looks like:
# zpool status tank
pool: tank
state: UNAVAIL
status: One or more devices could not be used because the label is missing
or invalid. There are insufficient replicas for the pool to continue
functioning.
action: Destroy and re-create the pool from a backup source.
see: http://www.sun.com/msg/ZFS-8000-5E
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank UNAVAIL 0 0 0 insufficient replicas
raidz2 UNAVAIL 0 0 0 insufficient replicas
c2t31d0 FAULTED 0 0 0 corrupted data
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t1d0 FAULTED 0 0 0 corrupted data
c1t12d0 FAULTED 0 0 0 corrupted data
c1t6d0 ONLINE 0 0 0
c1t7d0 ONLINE 0 0 0
c1t8d0 ONLINE 0 0 0
c1t9d0 ONLINE 0 0 0
c1t5d0 FAULTED 0 0 0 corrupted data
c1t11d0 ONLINE 0 0 0
c2t25d0 FAULTED 0 0 0 corrupted data
raidz2 DEGRADED 0 0 0
c1t4d0 FAULTED 0 0 0 corrupted data
c2t26d0 ONLINE 0 0 0
c2t27d0 ONLINE 0 0 0
c2t28d0 ONLINE 0 0 0
c2t29d0 ONLINE 0 0 0
c2t30d0 ONLINE 0 0 0
c1t10d0 FAULTED 0 0 0 corrupted data
c2t32d0 ONLINE 0 0 0
c2t33d0 ONLINE 0 0 0
c2t34d0 ONLINE 0 0 0
c2t35d0 ONLINE 0 0 0
c2t48d0 ONLINE 0 0 0
Now, my only question is WTF? Can anyone shed some light on this?
Obviously I won''t be pulling any shenanigans once this system is in
production, but if the system loses power and something crazy happens, I
don''t want to be seeing this when the systems comes back up.
--
This message posted from opensolaris.org