Martin Mundschenk
2010-Feb-08 17:05 UTC
[zfs-discuss] Drive failure causes system to be unusable
Hi! I have a OSOL box as a home file server. It has 4 1TB USB Drives and 1 TB FW-Drive attached. The USB devices are combined to a RaidZ-Pool and the FW Drive acts as a hot spare. This night, one USB drive faulted and the following happened: 1. The zpool was not accessible anymore 2. changing to a directory on the pool causes the tty to get stuck 3. no reboot was possible 4. the system had to be rebooted ungracefully by pushing the power button After reboot: 1. The zpool ran in a degraded state 2. the spare device did NOT automatically go online 3. the system did not boot to the usual run level, and no auto-boot zones where started, GDM did not start either NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 c21t0d0 ONLINE 0 0 0 c22t0d0 ONLINE 0 0 0 c20t0d0 FAULTED 0 0 0 corrupted data c23t0d0 ONLINE 0 0 0 cache c18t0d0 ONLINE 0 0 0 spares c16t0d0 AVAIL My questions: 1. Why does the system get stuck, when a device faults? 2. Why does the hot spare not go online? (The manual says, that going online automatically is the default behavior) 3. Why does the system not boot to the usual run level, when a zpool is in a degraded state at boot time? Regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100208/7d7b97ae/attachment.html>
Richard Elling
2010-Feb-08 19:03 UTC
[zfs-discuss] Drive failure causes system to be unusable
On Feb 8, 2010, at 9:05 AM, Martin Mundschenk wrote:> Hi! > > I have a OSOL box as a home file server. It has 4 1TB USB Drives and 1 TB FW-Drive attached. The USB devices are combined to a RaidZ-Pool and the FW Drive acts as a hot spare. > > This night, one USB drive faulted and the following happened: > > 1. The zpool was not accessible anymore > 2. changing to a directory on the pool causes the tty to get stuck > 3. no reboot was possible > 4. the system had to be rebooted ungracefully by pushing the power button > > After reboot: > > 1. The zpool ran in a degraded state > 2. the spare device did NOT automatically go online > 3. the system did not boot to the usual run level, and no auto-boot zones where started, GDM did not start either > > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > raidz1-0 DEGRADED 0 0 0 > c21t0d0 ONLINE 0 0 0 > c22t0d0 ONLINE 0 0 0 > c20t0d0 FAULTED 0 0 0 corrupted data > c23t0d0 ONLINE 0 0 0 > cache > c18t0d0 ONLINE 0 0 0 > spares > c16t0d0 AVAIL > > > > My questions: > > 1. Why does the system get stuck, when a device faults?Are you sure there is not another fault here? What does "svcs -xv" show? -- richard> 2. Why does the hot spare not go online? (The manual says, that going online automatically is the default behavior) > 3. Why does the system not boot to the usual run level, when a zpool is in a degraded state at boot time? > > > Regards, > Martin > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Dr. Martin Mundschenk
2010-Feb-09 21:12 UTC
[zfs-discuss] Drive failure causes system to be unusable
Am 08.02.2010 um 20:03 schrieb Richard Elling:> Are you sure there is not another fault here? What does "svcs -xv" show?Well, I don''t have the result of svcs -xv, since the fault is recovered by now, but it turned out not to be a hardware failure but an unstable USB-conectivity. But sill: Why does the system get stuck? Even when a USB-Plug is unhooked, why does the spare does not go online? Martin