axa
2006-Jun-02 17:42 UTC
[zfs-discuss] [Probably a bug] zfs disk got UNAVAIL state and can not be repair.
Hello All : I have a 16xSATA disks DISK ARRAY with JBOD configuration and it''s attached LSI FC HBA card. I use 2 raidz groups there are combine with 8 disks. zpool status result as following: === zpool status = NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 raidz ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t0d1 ONLINE 0 0 0 c6t0d2 ONLINE 0 0 0 c6t0d3 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t1d1 ONLINE 0 0 0 c6t1d2 ONLINE 0 0 0 c6t1d3 ONLINE 0 0 0 raidz ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c6t2d1 ONLINE 0 0 0 c6t2d2 ONLINE 0 0 0 c6t2d3 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c6t3d1 ONLINE 0 0 0 c6t3d2 ONLINE 0 0 0 c6t3d3 ONLINE 0 0 0 For testing ZFS fault tolerant, I have taken out 1 disk from array shelf about 30 secs then insert disk again.The ZFS seems detected error and show me device c6t1d2 stats is UNAVAIL. I am sure that DISK ARRAY its "lun mapping" and "config status" are all correct. === zpool status ==-bash-3.00# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT pool 3.63T 91.1G 3.54T 2% DEGRADED - -bash-3.00# zpool status -x pool: pool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Sat Jun 3 00:14:00 2006 config: NAME STATE READ WRITE CKSUM pool DEGRADED 0 0 0 raidz DEGRADED 0 0 0 c6t0d0 ONLINE 0 0 0 c6t0d1 ONLINE 0 0 0 c6t0d2 ONLINE 0 0 0 c6t0d3 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t1d1 ONLINE 0 0 0 c6t1d2 UNAVAIL 0 0 0 c6t1d3 ONLINE 0 0 0 raidz ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c6t2d1 ONLINE 0 0 0 c6t2d2 ONLINE 0 0 0 c6t2d3 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c6t3d1 ONLINE 0 0 0 c6t3d2 ONLINE 0 0 0 c6t3d3 ONLINE 0 0 0 So, I use "zpool offline pool c6t1d2" to disconnect data access to c6t1d2 then use "zpool online pool c6t1d2" command to reattach disk into pool, But I got some error when I execute "zpool offline pool c6t1d2" === zpool offline ==-bash-3.00# zpool offline pool c6t1d2 cannot offline 3c6t1d2: no valid replicas It seems NOT work! I tried to use "zpool export" then "zpool import" command to flush ZFS memory cache. After "zpool import" ZFS pool still got "DEGRADED" status and c6t1d2 disk state are UNAVAIL either. Opps.....There is something wrong ...... Thats okey. I reboot the server with "reboot -- -r" command for reconfigure device. After server restarted, the "format" and "cfgadm" command shows me c6t2d2 is working and its part of ZFS pool. === format ==-bash-3.00# format -d c6t2d2 Searching for disks...done selecting c6t2d2 [disk formatted] /dev/dsk/c6t2d2s0 is part of active ZFS pool pool. Please see zpool(1M). === cfgadm ==-bash-3.00# cfgadm -al c6::dsk/c6t2d1 Ap_Id Type Receptacle Occupant Condition c6::dsk/c6t2d1 disk connected configured unknown -bash-3.00# And then I check ZFS pool : === zpool status ==-bash-3.00# zpool status -x pool: pool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Sat Jun 3 00:14:00 2006 config: NAME STATE READ WRITE CKSUM pool DEGRADED 0 0 0 raidz DEGRADED 0 0 0 c6t0d0 ONLINE 0 0 0 c6t0d1 ONLINE 0 0 0 c6t0d2 ONLINE 0 0 0 c6t0d3 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t1d1 ONLINE 0 0 0 3449041879167855716 UNAVAIL 0 0 0 was /dev/dsk/c6t1d2s0 c6t1d3 ONLINE 0 0 0 raidz ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c6t2d1 ONLINE 0 0 0 c6t2d2 ONLINE 0 0 0 c6t2d3 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c6t3d1 ONLINE 0 0 0 c6t3d2 ONLINE 0 0 0 c6t3d3 ONLINE 0 0 0 errors: No known data errors -bash-3.00# Opps..... I have no idea What "3449041879167855716" it is. @_@ Okey, I use "zpool offline" again, Unfortunately it didnt work. === zpool offlin =-bash-3.00# zpool offline pool 3449041879167855716 cannot offline 3449041879167855716: no valid replicas It seems solaris has got disk device "c6t1d2" but I dont know why "zpool status" shown c6t1d2 state is UNAVAIL. I read "Solaris ZFS Administration Guide" and tried any possible command but ZFS pool still show me DEGRADED and c6t1d2 can not be fixed or replace by any zpool command. Anyone knows how to repair this incident?? This message posted from opensolaris.org
Eric Schrock
2006-Jun-02 17:59 UTC
[zfs-discuss] [Probably a bug] zfs disk got UNAVAIL state and can not be repair.
On Fri, Jun 02, 2006 at 10:42:08AM -0700, axa wrote:> Hello All : > I have a 16xSATA disks DISK ARRAY with JBOD configuration and it''s attached LSI FC HBA card. I use 2 raidz groups there are combine with 8 disks. zpool status result as following: > > === zpool status => > NAME STATE READ WRITE CKSUM > pool ONLINE 0 0 0 > raidz ONLINE 0 0 0 > c6t0d0 ONLINE 0 0 0 > c6t0d1 ONLINE 0 0 0 > c6t0d2 ONLINE 0 0 0 > c6t0d3 ONLINE 0 0 0 > c6t1d0 ONLINE 0 0 0 > c6t1d1 ONLINE 0 0 0 > c6t1d2 ONLINE 0 0 0 > c6t1d3 ONLINE 0 0 0 > raidz ONLINE 0 0 0 > c6t2d0 ONLINE 0 0 0 > c6t2d1 ONLINE 0 0 0 > c6t2d2 ONLINE 0 0 0 > c6t2d3 ONLINE 0 0 0 > c6t3d0 ONLINE 0 0 0 > c6t3d1 ONLINE 0 0 0 > c6t3d2 ONLINE 0 0 0 > c6t3d3 ONLINE 0 0 0 > > > > For testing ZFS fault tolerant, I have taken out 1 disk from array shelf about 30 secs then insert disk again.The ZFS seems detected error and show me device c6t1d2 stats is UNAVAIL. I am sure that DISK ARRAY its "lun mapping" and "config status" are all correct. > > === zpool status ==> -bash-3.00# zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > pool 3.63T 91.1G 3.54T 2% DEGRADED - > -bash-3.00# zpool status -x > pool: pool > state: DEGRADED > status: One or more devices could not be opened. Sufficient replicas exist for > the pool to continue functioning in a degraded state. > action: Attach the missing device and online it using ''zpool online''. > see: http://www.sun.com/msg/ZFS-8000-D3 > scrub: resilver completed with 0 errors on Sat Jun 3 00:14:00 2006 > config: > > NAME STATE READ WRITE CKSUM > pool DEGRADED 0 0 0 > raidz DEGRADED 0 0 0 > c6t0d0 ONLINE 0 0 0 > c6t0d1 ONLINE 0 0 0 > c6t0d2 ONLINE 0 0 0 > c6t0d3 ONLINE 0 0 0 > c6t1d0 ONLINE 0 0 0 > c6t1d1 ONLINE 0 0 0 > c6t1d2 UNAVAIL 0 0 0 > c6t1d3 ONLINE 0 0 0 > raidz ONLINE 0 0 0 > c6t2d0 ONLINE 0 0 0 > c6t2d1 ONLINE 0 0 0 > c6t2d2 ONLINE 0 0 0 > c6t2d3 ONLINE 0 0 0 > c6t3d0 ONLINE 0 0 0 > c6t3d1 ONLINE 0 0 0 > c6t3d2 ONLINE 0 0 0 > c6t3d3 ONLINE 0 0 0 > > > So, I use "zpool offline pool c6t1d2" to disconnect data access to > c6t1d2 then use "zpool online pool c6t1d2" command to reattach disk > into pool, But I got some error when I execute "zpool offline pool > c6t1d2" > > === zpool offline ==> -bash-3.00# zpool offline pool c6t1d2 > cannot offline 3c6t1d2: no valid replicasThis is a known bug. ''zpool offline'' is too conservative in its check for replicas.> > It seems NOT work! I tried to use "zpool export" then "zpool import" > command to flush ZFS memory cache. After "zpool import" ZFS pool still > got "DEGRADED" status and c6t1d2 disk state are UNAVAIL either. > Opps.....There is something wrong ...... >Once you do a ''zpool import'', if the device was unavailable at the time of the import, we have no idea whether the config is actually correct or not. Hence marking it persistently unavailable (we don''t know what device it is, only what it was at time of import). Still, we should try persistently unavailable devices if only for this circumstance. I''ll file a bug.> It seems solaris has got disk device "c6t1d2" but I dont know why > "zpool status" shown c6t1d2 state is UNAVAIL. > > > I read "Solaris ZFS Administration Guide" and tried any possible > command but ZFS pool still show me DEGRADED and c6t1d2 can not be > fixed or replace by any zpool command. > > > Anyone knows how to repair this incident??Does ''zpool replace'' work? In particular: # zpool replace 3449041879167855716 c6t1d2 Note that offline/online is not the same as replacing a device. When you online a device, it must match the known configuration (in this case, because you did a ''zpool export'', it doesn''t match for the but above). For a replace, it will treat the device as unknown and re-write the label and resilver data appropriately. If that doesn''t work, then something is really wrong. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
axa
2006-Jun-03 12:47 UTC
[zfs-discuss] Re: [Probably a bug] zfs disk got UNAVAIL state and
Hello Eric : Thanks for your reply. ;-) ZFS team member and you are really hard to work on ZFS. Thank you very much to help us a wonderful storage solution. Okey.... As you said : Does ''zpool replace'' work? In particular: # zpool replace 3449041879167855716 c6t1d2 I tried that command (zpool replace c6t1d2 c6t1d2 OR zpool replace pool 3449041879167855716 c6t1d2) and got error message as following before I posted this question. -bash-3.00# zpool replace pool 3449041879167855716 c6t1d2 cannot replace 3449041879167855716 with resilver: c6t1d2 busy -bash-3.00# And then I tried use "zpool scrub" but doesnt work. BTW , I tried offline other disk device (c6t2d3 ) then replace it for c6t1d2. But it doesnt work too. -bash-3.00# zpool replace -f pool c6t1d2 c6t2d3 invalid vdev specification the following errors must be manually repaired: /dev/dsk/c6t2d3s0 is part of active ZFS pool pool. Please see zpool(1M). -bash-3.00# Perhaps, I have to wait for new zfs release . It''s ok, Because, No one could take out disk before execute "zpool offline" , right ? I realized that incident I mentioned would not a normal opeartion , it is cause disk state to UNAVAIL. Thanks. This message posted from opensolaris.org