Hi, I''m currently trying to work with a quad-bay USB drive enclosure. I''ve created a raidz pool as follows: bleonard at opensolaris:~# zpool status r5pool pool: r5pool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM r5pool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c1t0d0p0 ONLINE 0 0 0 c1t0d1p0 ONLINE 0 0 0 c1t0d2p0 ONLINE 0 0 0 c1t0d3p0 ONLINE 0 0 0 errors: No known data errors If I pop a disk and run a zpool scrub, the fault is noted: bleonard at opensolaris:~# zpool scrub r5pool bleonard at opensolaris:~# zpool status r5pool pool: r5pool state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-4J scrub: scrub completed after 0h0m with 0 errors on Mon Jul 12 12:35:46 2010 config: NAME STATE READ WRITE CKSUM r5pool DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c1t0d0p0 ONLINE 0 0 0 c1t0d1p0 ONLINE 0 0 0 c1t0d2p0 FAULTED 0 0 0 corrupted data c1t0d3p0 ONLINE 0 0 0 errors: No known data errors However, it''s when I pop the disk back in that everything goes south. If I run a zpool scrub at this point, the command appears to just hang. Running zpool status again shows the scrub will finish in 2 minutes, but I never does. You can see it''s been running for 33 minutes already, and there''s no data in the pool. bleonard at opensolaris:/r5pool# zpool status r5pool pool: r5pool state: ONLINE status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run ''zpool clear''. see: http://www.sun.com/msg/ZFS-8000-HC scrub: scrub in progress for 0h33m, 92.41% done, 0h2m to go config: NAME STATE READ WRITE CKSUM r5pool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c1t0d0p0 ONLINE 0 0 0 c1t0d1p0 ONLINE 0 0 0 c1t0d2p0 ONLINE 0 0 0 c1t0d3p0 ONLINE 0 0 0 errors: 24 data errors, use ''-v'' for a list zpool scrub -s r5pool doesn''t have any effect. I can''t even kill the scrub process. Even a reboot command at this point will hang the machine, so I have to hard power-cycle the machine to get everything back to normal. There must be a more elegant solution, right? -- This message posted from opensolaris.org
Hi Brian,
What are you trying to determine? How the pool behaves when a drive is
yanked out?
Its hard to tell how a pool will react with external USB drives. I think
it will also depend on how the system handles a device removal.
I created a similar raidz pool with non-USB devices, offlined a disk,
and ran a scrub. It works as expected. See the output below. Could
you retry your test with an offline rather than a yank and see if
the system hangs?
In addition, we don''t support pools that are created on p* devices.
Use the c1t0d* names instead.
Thanks,
Cindy
# zpool create rzpool raidz1 c2t6d0 c2t7d0 c2t8d0
# zpool offline rzpool c2t8d0
# zpool status rzpool
pool: rzpool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using ''zpool online'' or replace the
device with
''zpool replace''.
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rzpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
c2t6d0 ONLINE 0 0 0
c2t7d0 ONLINE 0 0 0
c2t8d0 OFFLINE 0 0 0
errors: No known data errors
# zpool scrub rzpool
# zpool status rzpool
pool: rzpool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using ''zpool online'' or replace the
device with
''zpool replace''.
scan: scrub repaired 0 in 0h0m with 0 errors on Mon Jul 12 09:56:36 2010
config:
NAME STATE READ WRITE CKSUM
rzpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
c2t6d0 ONLINE 0 0 0
c2t7d0 ONLINE 0 0 0
c2t8d0 OFFLINE 0 0 0
errors: No known data errors
# zpool status rzpool
pool: rzpool
state: ONLINE
scan: resilvered 14K in 0h0m with 0 errors on Mon Jul 12 10:12:55 2010
config:
NAME STATE READ WRITE CKSUM
rzpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c2t6d0 ONLINE 0 0 0
c2t7d0 ONLINE 0 0 0
c2t8d0 ONLINE 0 0 0
errors: No known data errors
On 07/12/10 10:45, Brian Leonard wrote:> Hi,
>
> I''m currently trying to work with a quad-bay USB drive enclosure.
I''ve created a raidz pool as follows:
>
> bleonard at opensolaris:~# zpool status r5pool
> pool: r5pool
> state: ONLINE
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> r5pool ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> c1t0d0p0 ONLINE 0 0 0
> c1t0d1p0 ONLINE 0 0 0
> c1t0d2p0 ONLINE 0 0 0
> c1t0d3p0 ONLINE 0 0 0
>
> errors: No known data errors
>
> If I pop a disk and run a zpool scrub, the fault is noted:
>
> bleonard at opensolaris:~# zpool scrub r5pool
> bleonard at opensolaris:~# zpool status r5pool
> pool: r5pool
> state: DEGRADED
> status: One or more devices could not be used because the label is missing
or
> invalid. Sufficient replicas exist for the pool to continue
> functioning in a degraded state.
> action: Replace the device using ''zpool replace''.
> see: http://www.sun.com/msg/ZFS-8000-4J
> scrub: scrub completed after 0h0m with 0 errors on Mon Jul 12 12:35:46
2010
> config:
>
> NAME STATE READ WRITE CKSUM
> r5pool DEGRADED 0 0 0
> raidz1 DEGRADED 0 0 0
> c1t0d0p0 ONLINE 0 0 0
> c1t0d1p0 ONLINE 0 0 0
> c1t0d2p0 FAULTED 0 0 0 corrupted data
> c1t0d3p0 ONLINE 0 0 0
>
> errors: No known data errors
>
> However, it''s when I pop the disk back in that everything goes
south. If I run a zpool scrub at this point, the command appears to just hang.
>
> Running zpool status again shows the scrub will finish in 2 minutes, but I
never does. You can see it''s been running for 33 minutes already, and
there''s no data in the pool.
>
> bleonard at opensolaris:/r5pool# zpool status r5pool
> pool: r5pool
> state: ONLINE
> status: One or more devices are faulted in response to IO failures.
> action: Make sure the affected devices are connected, then run
''zpool clear''.
> see: http://www.sun.com/msg/ZFS-8000-HC
> scrub: scrub in progress for 0h33m, 92.41% done, 0h2m to go
> config:
>
> NAME STATE READ WRITE CKSUM
> r5pool ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> c1t0d0p0 ONLINE 0 0 0
> c1t0d1p0 ONLINE 0 0 0
> c1t0d2p0 ONLINE 0 0 0
> c1t0d3p0 ONLINE 0 0 0
>
> errors: 24 data errors, use ''-v'' for a list
>
> zpool scrub -s r5pool doesn''t have any effect.
>
> I can''t even kill the scrub process. Even a reboot command at this
point will hang the machine, so I have to hard power-cycle the machine to get
everything back to normal. There must be a more elegant solution, right?
Hi Cindy, I''m trying to demonstrate how ZFS behaves when a disk fails. The drive enclosure I''m using (http://www.icydock.com/product/mb561us-4s-1.html) says it supports hot swap, but that''s not what I''m experiencing. When I plug the disk back in, all 4 disks are no longer recognizable until I restart the enclosure. This same demo works fine when using USB sticks, and maybe that''s because each USB stick has its own controller. Thanks for your help, Brian -- This message posted from opensolaris.org
Actually, there''s still the primary issue of this post - the apparent
hang. At the moment, I have 3 zpool commands running, all apparently hung and
doing nothing:
bleonard at opensolaris:~$ ps -ef | grep zpool
root 20465 20411 0 18:10:44 pts/4 0:00 zpool clear r5pool
root 20408 20403 0 18:08:19 pts/3 0:00 zpool status r5pool
root 20396 17612 0 18:08:04 pts/2 0:00 zpool scrub r5pool
You can see all of them are not very busy, and seem to be waiting on something:
bleonard at opensolaris:~# ptime -p 20465
real 12:25.188031517
user 0.004037420
sys 0.008682963
bleonard at opensolaris:~# ptime -p 20408
real 15:03.977246851
user 0.002700817
sys 0.005662413
bleonard at opensolaris:~# ptime -p 20396
real 15:24.793176743
user 0.002954137
sys 0.014851215
And as I said earlier, I can''t control+break or kill any of these
processes. Time for hard-reboot.
/Brian
--
This message posted from opensolaris.org