Hi folks, At home I run OpenSolaris x86 with a 4 drive Raid-Z (4x1TB) zpool and it''s not in great shape. A fan stopped spinning and soon after the top disk failed (cause you know, heat rises). Naturally, OpenSolaris and ZFS didn''t skip a beat; I didn''t even notice it was dead until I saw the disk activity LED stuck on nearly a week later. So I decided I would attach the disks to 2nd system (with working fans) where I could backup the data to tape. So here''s where I got dumb...I ran ''zpool export''. Of course, I never actually ended up attaching the disks to another machine, but ever since that export I''ve been unable to import the pool at all. I''ve ordered a replacement 1TB disk, but it hasn''t arrived yet. Since I got no errors from the scrub I ran while the array was degraded, I''m pretty confident that the remaining 3 disks have valid data. * Should I be able to import a degraded pool? * If not, shouldn''t there be a warning when exporting a degraded pool? * If replace 1TB dead disk with a blank disk, might the import work? * Are there any tools (or commercial services) for ZFS recovery? I read a blog post (which naturally now I can''t find) where someone in similar circumstances was able to import his pool after restoring /etc/zfs/zpool.cache from a backup before the ''zpool export''. Naturally this guy was doing it with ZFS-FUSE under Linux, so it''s another step removed, but can someone explain to me the logic & risks of trying such a thing? Will it work if the zpool.cache comes from 1day/1week/1month old backup? So here''s what I get... peter at pickle:~$ pfexec zpool import pool: andre id: 5771661786439152324 state: FAULTED status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-3C config: andre FAULTED corrupted data raidz1 DEGRADED c5t0d0 ONLINE c5t1d0 ONLINE c5t2d0 ONLINE c5t3d0 UNAVAIL cannot open Any constructive suggestions would be greatly appreciated. Thanks --Peter -- This message posted from opensolaris.org
Daniel Carosone
2010-Apr-13 04:24 UTC
[zfs-discuss] ZFS RAID-Z1 Degraded Array won''t import
On Mon, Apr 12, 2010 at 08:01:27PM -0700, Peter Tripp wrote:> So I decided I would attach the disks to 2nd system (with working fans) where I could backup the data to tape. So here''s where I got dumb...I ran ''zpool export''. Of course, I never actually ended up attaching the disks to another machine, but ever since that export I''ve been unable to import the pool at all. I''ve ordered a replacement 1TB disk, but it hasn''t arrived yet. Since I got no errors from the scrub I ran while the array was degraded, I''m pretty confident that the remaining 3 disks have valid data. > > * Should I be able to import a degraded pool?Did you try with -f? I doubt it will help.> * If not, shouldn''t there be a warning when exporting a degraded pool?Interesting point.> * If replace 1TB dead disk with a blank disk, might the import work?Only if the import is failing because the dead disk is nonresponsive in a way that makes the import hang. Otherwise, you''d import the pool first then replace the drive.> * Are there any tools (or commercial services) for ZFS recovery?Dunno about commercial services, zpool and zdb seem to work most of the time.> I read a blog post (which naturally now I can''t find) where someone > in similar circumstances was able to import his pool after restoring > /etc/zfs/zpool.cache from a backup before the ''zpool > export''. Naturally this guy was doing it with ZFS-FUSE under Linux, > so it''s another step removed, but can someone explain to me the > logic & risks of trying such a thing? Will it work if the > zpool.cache comes from 1day/1week/1month old backup?If you have auto-snapshots of your running BE (/etc) from before the import, that should work fine. Note that you can pass import an argument "-c cachefile" so you don''t have to interfere with the current system one. You''d have to do this on the original system, I think. The logic is that the cachefile contains copies of the labels of the missing devices, and can substitute for the devices themselves when importing a degradedd pool (typically at boot). This is useful enough that i''d like to see some of the reserved area between the on-disk labels and the first metaslab on each disk, used to store a copy of the cache file / same data. That way every pool member has the information about other members necessary to import a degraded pool. Even if it had to be extracted first with zdb to be used as a separate zpool.cache as above, it would be helpful for this scenario. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100413/c6ec682d/attachment.bin>
Richard Elling
2010-Apr-13 05:51 UTC
[zfs-discuss] ZFS RAID-Z1 Degraded Array won''t import
On Apr 12, 2010, at 8:01 PM, Peter Tripp wrote:> Hi folks, > > At home I run OpenSolaris x86 with a 4 drive Raid-Z (4x1TB) zpool and it''s not in great shape. A fan stopped spinning and soon after the top disk failed (cause you know, heat rises). Naturally, OpenSolaris and ZFS didn''t skip a beat; I didn''t even notice it was dead until I saw the disk activity LED stuck on nearly a week later. > > So I decided I would attach the disks to 2nd system (with working fans) where I could backup the data to tape. So here''s where I got dumb...I ran ''zpool export''. Of course, I never actually ended up attaching the disks to another machine, but ever since that export I''ve been unable to import the pool at all. I''ve ordered a replacement 1TB disk, but it hasn''t arrived yet. Since I got no errors from the scrub I ran while the array was degraded, I''m pretty confident that the remaining 3 disks have valid data. > > * Should I be able to import a degraded pool?In general, yes. But it is complaining about corrupted data, which can be due to another failure.> * If not, shouldn''t there be a warning when exporting a degraded pool?What should the warning say?> * If replace 1TB dead disk with a blank disk, might the import work?Have you tried simply removing the dead drive? Also, the ZFS Troubleshooting Guide has procedures that might help.> * Are there any tools (or commercial services) for ZFS recovery?Versions of OpenSolaris after b128 have additional recovery capability using the "zpool import -F" option. -- richard> > I read a blog post (which naturally now I can''t find) where someone in similar circumstances was able to import his pool after restoring /etc/zfs/zpool.cache from a backup before the ''zpool export''. Naturally this guy was doing it with ZFS-FUSE under Linux, so it''s another step removed, but can someone explain to me the logic & risks of trying such a thing? Will it work if the zpool.cache comes from 1day/1week/1month old backup? > > So here''s what I get... > peter at pickle:~$ pfexec zpool import > pool: andre > id: 5771661786439152324 > state: FAULTED > status: One or more devices are missing from the system. > action: The pool cannot be imported. Attach the missing > devices and try again. > see: http://www.sun.com/msg/ZFS-8000-3C > config: > > andre FAULTED corrupted data > raidz1 DEGRADED > c5t0d0 ONLINE > c5t1d0 ONLINE > c5t2d0 ONLINE > c5t3d0 UNAVAIL cannot open > > Any constructive suggestions would be greatly appreciated. > Thanks > --Peter > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
> Did you try with -f? I doubt it will help.Yep, no luck with -f, -F or -fF.> > * If replace 1TB dead disk with a blank disk, might > the import work? > > Only if the import is failing because the dead disk > is nonresponsive in a way that makes the import hang. > Otherwise, you''d import the pool first then replace the drive.That''s what I thought. The bad disk isn''t recognized by the SATA card BIOS, so it''s not half-gone...it''s totally missing (also tried with disk removed, no difference).> If you have auto-snapshots of your running BE (/etc) from before the import, > that should work fine. Note that you can pass import an argument > "-c cachefile" so you don''t have to interfere with the current system one. > > You''d have to do this on the original system, I think.Just checked and didn''t have automatic snapshots/time slider enabled. My rpool only has an initial snapshot from install, and sadly I didn''t have these four disks attached at that point. And naturally I don''t have any backups. FML.> The logic is that the cachefile contains copies of the labels of the missing > devices, and can substitute for the devices themselves when importing a > degradedd pool (typically at boot). > > This is useful enough that i''d like to see some of the reserved area > between the on-disk labels and the first metaslab on each disk, used > to store a copy of the cache file / same data. That way every pool > member has the information about other members necessary to import a > degraded pool. Even if it had to be extracted first with zdb to be > used as a separate zpool.cache as above, it would be helpful for this > scenario.Yeah, I''d totally give up another 256KB/disk or whatever to make a degraded array easier to import. Anyone know if there''s an RFE for this? I may have a disk lying around from a previous OpenSolaris install or from Linux ZFS-FUSE where this zpool was originally created. Maybe I''ll get lucky and find an old zpool.cache to try. Any other ideas? (Besides the obvious ''BACKUP YOUR F**KING /ETC ONCE IN A WHILE!'') -- This message posted from opensolaris.org
>> * Should I be able to import a degraded pool? > In general, yes. But it is complaining about corrupted data, which can > be due to another failure.Any suggestions on how to discover what that failure might be?>> * If not, shouldn''t there be a warning when exporting a degraded pool? > What should the warning say?"You''re exporting a degraded pool, it''s recommended you replace address this issue by replacing missing/failing/failed disks and allowing resilver to complete before exporting, otherwise this pool may subsequently fail to import. Use -f to export anyways.">> * If replace 1TB dead disk with a blank disk, might the import work? > Have you tried simply removing the dead drive?Yep. No help.> Also, the ZFS Troubleshooting Guide has procedures > that might help.I''ve been reading this document, but it seems to cover working with either zpools already imported and with rpools (mirrored not raidz) which cause the system to fail to boot, but maybe I''m misreading some of it.> Versions of OpenSolaris after b128 have additional recovery capability > using the "zpool import -F" option.Ooo...that sounds promising, is this PSARC 2009/479 or something different? http://www.c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-support.html Since 2010.03 (aka 2010.someday) isn''t coming anytime soon, can anyone recommend another distro with a LiveCD based on snv128 or later so I can try and give this a shot? -- This message posted from opensolaris.org