I have a zpool on a JBOD SE3320 that I was using for data with Solaris 10 (the root/usr/var filesystems were all UFS). Unfortunately, we had a bit of a mixup with SCSI cabling and I believe that we created a SCSI target clash. The system was unloaded and nothing happened until I ran "zpool status" at which point things broke. After correcting all the cabling, Solaris panic''d before reaching single user. Sun Support could only suggest restoring from backups - but unfortunately, we do not have backups of some of the data that we would like to recover. Since OpenSolaris has a much newer version of ZFS, I thought I would give OpenSolaris a try and it looks slightly more promising, though I still can''t access the pool. The following is using snv125 on a T2000. root at als253:~# zpool import -F data Nov 17 15:26:46 opensolaris zfs: WARNING: can''t open objset for data/backup root at als253:~# zpool status -v data pool: data state: FAULTED status: An intent log record could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run ''zpool online'', or ignore the intent log records by running ''zpool clear''. see: http://www.sun.com/msg/ZFS-8000-K4 scrub: none requested config: NAME STATE READ WRITE CKSUM data FAULTED 0 0 3 bad intent log raidz2-0 DEGRADED 0 0 18 c2t8d0 FAULTED 0 0 0 too many errors c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 0 0 c3t8d0 ONLINE 0 0 0 c3t9d0 ONLINE 0 0 0 c3t10d0 ONLINE 0 0 0 c3t11d0 ONLINE 0 0 0 c3t12d0 DEGRADED 0 0 0 too many errors c3t13d0 ONLINE 0 0 0 root at als253:~# zpool online data c2t8d0 Nov 17 15:28:42 opensolaris zfs: WARNING: can''t open objset for data/backup cannot open ''data'': pool is unavailable root at als253:~# zpool clear data cannot clear errors for data: one or more devices is currently unavailable root at als253:~# zpool clear -F data cannot open ''-F'': name must begin with a letter root at als253:~# zpool status data pool: data state: FAULTED status: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. Manually marking the device repaired using ''zpool clear'' may allow some data to be recovered. scrub: none requested config: NAME STATE READ WRITE CKSUM data FAULTED 0 0 1 corrupted data raidz2-0 FAULTED 0 0 6 corrupted data c2t8d0 FAULTED 0 0 0 too many errors c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 0 0 c3t8d0 ONLINE 0 0 0 c3t9d0 ONLINE 0 0 0 c3t10d0 ONLINE 0 0 0 c3t11d0 ONLINE 0 0 0 c3t12d0 DEGRADED 0 0 0 too many errors c3t13d0 ONLINE 0 0 0 root at als253:~# Annoyingly, data/backup is not a filesystem I''m especially worried about - I''d just like to get access to the other filesystems on it. Is is possible to hack the pool to make data/backup just disappear. For that matter: 1) Why is the whole pool faulted when n-2 vdevs are online? 2) Given that metadata is triplicated, where did the objset go? -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091118/38e22d1a/attachment.bin>
There is a new PSARC in b126(?) that allows to rollback to latest functioning uber block. Maybe it can help you? -- This message posted from opensolaris.org
Peter Jeremy wrote:> I have a zpool on a JBOD SE3320 that I was using for data with Solaris > 10 (the root/usr/var filesystems were all UFS). Unfortunately, we had > a bit of a mixup with SCSI cabling and I believe that we created a > SCSI target clash. The system was unloaded and nothing happened until > I ran "zpool status" at which point things broke. After correcting > all the cabling, Solaris panic''d before reaching single user.Do you have crash dump of this panic saved?> Sun Support could only suggest restoring from backups - but > unfortunately, we do not have backups of some of the data that we > would like to recover. > > Since OpenSolaris has a much newer version of ZFS, I thought I would > give OpenSolaris a try and it looks slightly more promising, though I > still can''t access the pool. The following is using snv125 on a T2000. > > root at als253:~# zpool import -F data > Nov 17 15:26:46 opensolaris zfs: WARNING: can''t open objset for data/backup > root at als253:~# zpool status -v data > pool: data > state: FAULTED > status: An intent log record could not be read. > Waiting for adminstrator intervention to fix the faulted pool. > action: Either restore the affected device(s) and run ''zpool online'', > or ignore the intent log records by running ''zpool clear''. > see: http://www.sun.com/msg/ZFS-8000-K4 > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > data FAULTED 0 0 3 bad intent log > raidz2-0 DEGRADED 0 0 18 > c2t8d0 FAULTED 0 0 0 too many errors > c2t9d0 ONLINE 0 0 0 > c2t10d0 ONLINE 0 0 0 > c2t11d0 ONLINE 0 0 0 > c2t12d0 ONLINE 0 0 0 > c2t13d0 ONLINE 0 0 0 > c3t8d0 ONLINE 0 0 0 > c3t9d0 ONLINE 0 0 0 > c3t10d0 ONLINE 0 0 0 > c3t11d0 ONLINE 0 0 0 > c3t12d0 DEGRADED 0 0 0 too many errors > c3t13d0 ONLINE 0 0 0 > root at als253:~# zpool online data c2t8d0 > Nov 17 15:28:42 opensolaris zfs: WARNING: can''t open objset for data/backup > cannot open ''data'': pool is unavailable > root at als253:~# zpool clear data > cannot clear errors for data: one or more devices is currently unavailable > root at als253:~# zpool clear -F data > cannot open ''-F'': name must begin with a letterOption -F is new one added with pool recovery support, so it''ll be available in build 128 only> root at als253:~# zpool status data > pool: data > state: FAULTED > status: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to > continue functioning. > action: Destroy and re-create the pool from a backup source. Manually marking the device > repaired using ''zpool clear'' may allow some data to be recovered. > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > data FAULTED 0 0 1 corrupted data > raidz2-0 FAULTED 0 0 6 corrupted data > c2t8d0 FAULTED 0 0 0 too many errors > c2t9d0 ONLINE 0 0 0 > c2t10d0 ONLINE 0 0 0 > c2t11d0 ONLINE 0 0 0 > c2t12d0 ONLINE 0 0 0 > c2t13d0 ONLINE 0 0 0 > c3t8d0 ONLINE 0 0 0 > c3t9d0 ONLINE 0 0 0 > c3t10d0 ONLINE 0 0 0 > c3t11d0 ONLINE 0 0 0 > c3t12d0 DEGRADED 0 0 0 too many errors > c3t13d0 ONLINE 0 0 0 > root at als253:~# > > Annoyingly, data/backup is not a filesystem I''m especially worried > about - I''d just like to get access to the other filesystems on it.I think it should be possible at least in readonly mode. I cannot tell if full recovery will be possible, but at least there''s good chance to get some data back. You can try build 128 as soon as it becomes available, or you can try to build BFU archives from source and apply to your build 125 BE.> Is is possible to hack the pool to make data/backup just disappear. > For that matter: > 1) Why is the whole pool faulted when n-2 vdevs are online?RAID-Z2 should survive 2 disk failures. But in this case as you say there was some misconfiguration on the storage side that as yo mention might cause SCSI target crash. ZFS verifies checksums and in this case it looks like some critical metadata block(s) in the most recent state fails checksum verification, so corruption is present on some of the online disks too, but as one disk is faulted ad another degraded ZFS is not able to identify what other disk has problem by using combinatorial reconstruction.> 2) Given that metadata is triplicated, where did the objset go?Metadata replication helps to protect against failures localized in space, but as all copies of metadata are written at the same time, it cannot protect against failures localized in time. regards, victor
On 2009-Nov-19 02:57:31 +0300, Victor Latushkin <Victor.Latushkin at Sun.COM> wrote:>> all the cabling, Solaris panic''d before reaching single user. > >Do you have crash dump of this panic saved?Yes. It was provided to Sun Support.>Option -F is new one added with pool recovery support, so it''ll be >available in build 128 onlyOK, thanks I knew it was new but I wasn''t certain exactly which build it had been imported into.>I think it should be possible at least in readonly mode. I cannot tell >if full recovery will be possible, but at least there''s good chance to >get some data back.That''s what I was hoping.>You can try build 128 as soon as it becomes available, or you can try to >build BFU archives from source and apply to your build 125 BE.I''m currently discussing this off-line with Tim Haley.>Metadata replication helps to protect against failures localized in >space, but as all copies of metadata are written at the same time, it >cannot protect against failures localized in time.Thanks for that. I suspected it might be something like this. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091119/068045ca/attachment.bin>
On 2009-Nov-18 08:40:41 -0800, Orvar Korvar <knatte_fnatte_tjatte at yahoo.com> wrote:>There is a new PSARC in b126(?) that allows to rollback to latest functioning uber block. Maybe it can help you?It''s in b128 and the feedback I''ve received suggests it will work. I''ve been trying to get the relevant ZFS bits for my b127 system but haven''t managed to get them to work so far. -- Peter Jeremy
Hello, This sounds similar to a problem I had a few months ago: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6869512 I don''t have a solution, but information from this possibly related bug may help. Andrew
On 2009-Nov-18 11:50:44 +1100, I wrote:>I have a zpool on a JBOD SE3320 that I was using for data with Solaris >10 (the root/usr/var filesystems were all UFS). Unfortunately, we had >a bit of a mixup with SCSI cabling and I believe that we created a >SCSI target clash. The system was unloaded and nothing happened until >I ran "zpool status" at which point things broke. After correcting >all the cabling, Solaris panic''d before reaching single user.I wound up installing OpenSolaris snv_128a on some spare disks and this enabled me to recover the data. Thanks to Tim Haley and Victor Latushkin for their assistance. As a first attempt, ''zpool import -F data'' said "Destroy and re-create the pool from a backup source.". ''zpool import -nFX data'' initially ran the system out of swap (I hadn''t attached any swap and it only has 8GB RAM): WARNING: /etc/svc/volatile: File system full, swap space limit exceeded INIT: Couldn''t write persistent state file `/etc/svc/volatile/init.state''. After rebooting and adding some swap (which didn''t seem to ever get used), it did work (though it took several hours - unfortunately, I didn''t record exactly how long): # zpool import -nFX data Would be able to return data to its state as of Thu Jan 01 10:00:00 1970. Would discard approximately 369 minutes of transactions. # zpool import -FX data Pool data returned to its state as of Thu Jan 01 10:00:00 1970. Discarded approximately 369 minutes of transactions. cannot share ''data/backup'': share(1M) failed cannot share ''data/JumpStart'': share(1M) failed cannot share ''data/OS_images'': share(1M) failed # I notice that the two times aren''t consistent but the data appears to be present and a ''zpool scrub'' reported no errors. I have reverted back to Solaris 10 and successfully copied all the data off. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091208/9c020d5c/attachment.bin>