One of the disks in my RAIDZ array was behaving oddly (lots of bus errors) so I took it offline to replace it. I shut down the server, put in the replacement disk, and rebooted. Only to discover that a different drive had chosen that moment to fail completely. So I replace the failing (but not yet failed) drive and try and import the pool. Failure, because that disk is marked offline. Is there any way to recover from this? System was running b118. Booting off my OS into single user mode causes the system to become extremely unhappy (any zfs command hangs the system for a very long time, and I get an error about being out of VM)... Booting of the osol live CD gives me: pool: media id: 4928877878517118807 state: UNAVAIL status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-EY config: media UNAVAIL insufficient replicas raidz1 UNAVAIL insufficient replicas c7t5d0 UNAVAIL cannot open c7t2d0 ONLINE c7t4d0 ONLINE c7t3d0 ONLINE c7t0d0 OFFLINE c7t7d0 ONLINE c7t1d0 ONLINE c7t6d0 ONLINE -- This message posted from opensolaris.org
paul at paularcher.org
2009-Sep-30 13:25 UTC
[zfs-discuss] Help importing pool with "offline" disk
> One of the disks in my RAIDZ array was behaving oddly (lots of bus errors) > so I took it offline to replace it. I shut down the server, put in the > replacement disk, and rebooted. Only to discover that a different drive > had chosen that moment to fail completely. So I replace the failing (but > not yet failed) drive and try and import the pool. Failure, because that > disk is marked offline. Is there any way to recover from this? > > System was running b118. Booting off my OS into single user mode causes > the system to become extremely unhappy (any zfs command hangs the system > for a very long time, and I get an error about being out of VM)... Booting > of the osol live CD gives me: > > pool: media > id: 4928877878517118807 > state: UNAVAIL > status: The pool was last accessed by another system. > action: The pool cannot be imported due to damaged devices or data. > see: http://www.sun.com/msg/ZFS-8000-EY > config: > > media UNAVAIL insufficient replicas > raidz1 UNAVAIL insufficient replicas > c7t5d0 UNAVAIL cannot open > c7t2d0 ONLINE > c7t4d0 ONLINE > c7t3d0 ONLINE > c7t0d0 OFFLINE > c7t7d0 ONLINE > c7t1d0 ONLINE > c7t6d0 ONLINE > -- > This message posted from opensolaris.org > >zpool online media c7t0d0 Paul
> zpool online media c7t0d0jack at opensolaris:~# zpool online media c7t0d0 cannot open ''media'': no such pool Already tried that ;-) -- This message posted from opensolaris.org
On Wed, 30 Sep 2009 11:01:13 PDT, Carson Gaspar <carson.gaspar at gmail.com> wrote:>> zpool online media c7t0d0 > >jack at opensolaris:~# zpool online media c7t0d0 >cannot open ''media'': no such pool > >Already tried that ;-)Perhaps you can try some subcommand of cfgadm to get c7t0d0 online, then import the pool again? -- ( Kees Nuyt ) c[_]
paul at paularcher.org
2009-Sep-30 19:23 UTC
[zfs-discuss] Help importing pool with "offline" disk
>> zpool online media c7t0d0 > > jack at opensolaris:~# zpool online media c7t0d0 > cannot open ''media'': no such pool > > Already tried that ;-) > -- > This message posted from opensolaris.org > >D''oh! Of course, I should have been paying attention to the fact that the pool wasn''t imported. My guess is that if you move /etc/zfs/zfs.cache out of the way, then reboot, ZFS will have to figure out what disks are out there again, find your disk, and realize it is online. Paul
> >> zpool online media c7t0d0 > > > > jack at opensolaris:~# zpool online media c7t0d0 > > cannot open ''media'': no such pool > > > > Already tried that ;-) > > -- > > This message posted from opensolaris.org > > > > > D''oh! Of course, I should have been paying attention > to the fact that the > pool wasn''t imported. > My guess is that if you move /etc/zfs/zfs.cache out > of the way, then > reboot, ZFS will have to figure out what disks are > out there again, find > your disk, and realize it is online.Sadly, no. Booting off the OpenSolaris LiveCD (which has no cache) doesn''t help. The "offline" nature of the disk must be in the ZFS data on the disks somewhere... -- Carson -- This message posted from opensolaris.org
> On Wed, 30 Sep 2009 11:01:13 PDT, Carson Gaspar > <carson.gaspar at gmail.com> wrote: > > >> zpool online media c7t0d0 > > > >jack at opensolaris:~# zpool online media c7t0d0 > >cannot open ''media'': no such pool > > > >Already tried that ;-) > > Perhaps you can try some subcommand of cfgadm to get > c7t0d0 > online, then import the pool again?cfgadm is happy - the offline problem is in ZFS somewhere c7::dsk/c7t0d0 disk connected configured unknown c7::dsk/c7t1d0 disk connected configured unknown c7::dsk/c7t2d0 disk connected configured unknown c7::dsk/c7t3d0 disk connected configured unknown c7::dsk/c7t4d0 disk connected configured unknown c7::dsk/c7t6d0 disk connected configured unknown c7::dsk/c7t7d0 disk connected configured unknown -- Carson -- This message posted from opensolaris.org
Victor Latushkin
2009-Sep-30 19:47 UTC
[zfs-discuss] Help importing pool with "offline" disk
Carson Gaspar wrote:>>>> zpool online media c7t0d0 >>> jack at opensolaris:~# zpool online media c7t0d0 >>> cannot open ''media'': no such pool >>> >>> Already tried that ;-) >>> -- >>> This message posted from opensolaris.org >>> >>> >> D''oh! Of course, I should have been paying attention >> to the fact that the >> pool wasn''t imported. >> My guess is that if you move /etc/zfs/zfs.cache out >> of the way, then >> reboot, ZFS will have to figure out what disks are >> out there again, find >> your disk, and realize it is online. > > Sadly, no. Booting off the OpenSolaris LiveCD (which has no cache) doesn''t help. The "offline" nature of the disk must be in the ZFS data on the disks somewhere... >is zdb happy with your pool? Try e.g. zdb -eud <poolname> Victor
Victor Latushkin wrote:> Carson Gaspar wrote: >>>>> zpool online media c7t0d0 >>>> jack at opensolaris:~# zpool online media c7t0d0 >>>> cannot open ''media'': no such pool >>>> >>>> Already tried that ;-) >>>> -- >>>> This message posted from opensolaris.org >>>> >>>> >>> D''oh! Of course, I should have been paying attention >>> to the fact that the >>> pool wasn''t imported. >>> My guess is that if you move /etc/zfs/zfs.cache out >>> of the way, then >>> reboot, ZFS will have to figure out what disks are >>> out there again, find >>> your disk, and realize it is online. >> >> Sadly, no. Booting off the OpenSolaris LiveCD (which has no cache) >> doesn''t help. The "offline" nature of the disk must be in the ZFS data >> on the disks somewhere... >> > > is zdb happy with your pool? > > Try e.g. > > zdb -eud <poolname>I''m booted back into snv118 (booting with the damaged pool disks disconnected so the host would come up without throwing up). After hot plugging the disks, I get: bash-3.2# /usr/sbin/zdb -eud media zdb: can''t open media: File exists "zpool status media" is hanging, and top shows that I''m spending ~50% of CPU time in the kernel - I''ll see what it says when it finally returns. Let me know if there''s anything else I can do to help you help me, including giving you a login in the server. -- Carson
Carson Gaspar wrote:> Victor Latushkin wrote: >> Carson Gaspar wrote:>> is zdb happy with your pool? >> >> Try e.g. >> >> zdb -eud <poolname> > > I''m booted back into snv118 (booting with the damaged pool disks > disconnected so the host would come up without throwing up). After hot > plugging the disks, I get: > > bash-3.2# /usr/sbin/zdb -eud media > zdb: can''t open media: File exists > > "zpool status media" is hanging, and top shows that I''m spending ~50% of > CPU time in the kernel - I''ll see what it says when it finally returns. > Let me know if there''s anything else I can do to help you help me, > including giving you a login in the server.OK, things are now different (possibly better?): bash-3.2# /usr/sbin/zpool status media pool: media state: FAULTED status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM media FAULTED 0 0 1 corrupted data raidz1 DEGRADED 0 0 6 c7t5d0 UNAVAIL 0 0 0 cannot open c7t2d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 I suspect that an uberblock rollback might help me - googling all the references now, but if someone has any advice, I''d be grateful. -- Carson
Carson Gaspar wrote:> Carson Gaspar wrote: >> Victor Latushkin wrote: >>> Carson Gaspar wrote: > >>> is zdb happy with your pool? >>> >>> Try e.g. >>> >>> zdb -eud <poolname> >> >> I''m booted back into snv118 (booting with the damaged pool disks >> disconnected so the host would come up without throwing up). After hot >> plugging the disks, I get: >> >> bash-3.2# /usr/sbin/zdb -eud media >> zdb: can''t open media: File exists >> >> "zpool status media" is hanging, and top shows that I''m spending ~50% >> of CPU time in the kernel - I''ll see what it says when it finally >> returns. Let me know if there''s anything else I can do to help you >> help me, including giving you a login in the server. > > OK, things are now different (possibly better?): > > bash-3.2# /usr/sbin/zpool status media > pool: media > state: FAULTED > status: One or more devices could not be opened. There are insufficient > replicas for the pool to continue functioning. > action: Attach the missing device and online it using ''zpool online''. > see: http://www.sun.com/msg/ZFS-8000-3C > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > media FAULTED 0 0 1 corrupted data > raidz1 DEGRADED 0 0 6 > c7t5d0 UNAVAIL 0 0 0 cannot open > c7t2d0 ONLINE 0 0 0 > c7t4d0 ONLINE 0 0 0 > c7t3d0 ONLINE 0 0 0 > c7t0d0 ONLINE 0 0 0 > c7t7d0 ONLINE 0 0 0 > c7t1d0 ONLINE 0 0 0 > c7t6d0 ONLINE 0 0 0 > > I suspect that an uberblock rollback might help me - googling all the > references now, but if someone has any advice, I''d be grateful.I''ll also note that the kernel is certainly doing _something_ with my pool... from "iostat -n -x 5": extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 40.5 5.4 1546.4 0.0 0.0 0.3 0.0 7.5 0 19 c7t0d0 40.5 5.4 1546.4 0.0 0.0 0.6 0.0 12.1 0 31 c7t1d0 44.1 5.8 1660.8 0.0 0.0 0.4 0.0 7.6 0 21 c7t2d0 41.9 5.4 1546.4 0.0 0.0 0.3 0.0 6.6 0 22 c7t3d0 40.7 5.8 1546.4 0.0 0.0 0.5 0.0 9.9 0 25 c7t4d0 40.3 5.4 1546.4 0.0 0.0 0.4 0.0 8.5 0 20 c7t6d0 40.5 5.4 1546.4 0.0 0.0 0.4 0.0 7.9 0 23 c7t7d0 -- Carson
Carson Gaspar wrote:> I''ll also note that the kernel is certainly doing _something_ with my > pool... from "iostat -n -x 5": > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 40.5 5.4 1546.4 0.0 0.0 0.3 0.0 7.5 0 19 c7t0d0 > 40.5 5.4 1546.4 0.0 0.0 0.6 0.0 12.1 0 31 c7t1d0 > 44.1 5.8 1660.8 0.0 0.0 0.4 0.0 7.6 0 21 c7t2d0 > 41.9 5.4 1546.4 0.0 0.0 0.3 0.0 6.6 0 22 c7t3d0 > 40.7 5.8 1546.4 0.0 0.0 0.5 0.0 9.9 0 25 c7t4d0 > 40.3 5.4 1546.4 0.0 0.0 0.4 0.0 8.5 0 20 c7t6d0 > 40.5 5.4 1546.4 0.0 0.0 0.4 0.0 7.9 0 23 c7t7d0And now I know what: bash-3.2# pgrep zfsdle | wc 15198 15198 86454 bash-3.2# uname -a SunOS gandalf.taltos.org 5.11 snv_118 i86pc i386 i86xpv I see a few other folks reporting this, but no responses. I don''t see any bugs filed against this, but I know the search engine is "differently coded"... -- Carson
Carson Gaspar wrote:> Carson Gaspar wrote: >> I''ll also note that the kernel is certainly doing _something_ with my >> pool... from "iostat -n -x 5": >> >> extended device statistics >> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 40.5 5.4 1546.4 0.0 0.0 0.3 0.0 7.5 0 19 c7t0d0 >> 40.5 5.4 1546.4 0.0 0.0 0.6 0.0 12.1 0 31 c7t1d0 >> 44.1 5.8 1660.8 0.0 0.0 0.4 0.0 7.6 0 21 c7t2d0 >> 41.9 5.4 1546.4 0.0 0.0 0.3 0.0 6.6 0 22 c7t3d0 >> 40.7 5.8 1546.4 0.0 0.0 0.5 0.0 9.9 0 25 c7t4d0 >> 40.3 5.4 1546.4 0.0 0.0 0.4 0.0 8.5 0 20 c7t6d0 >> 40.5 5.4 1546.4 0.0 0.0 0.4 0.0 7.9 0 23 c7t7d0 > > And now I know what: > > bash-3.2# pgrep zfsdle | wc > 15198 15198 86454 > bash-3.2# uname -a > SunOS gandalf.taltos.org 5.11 snv_118 i86pc i386 i86xpv > > I see a few other folks reporting this, but no responses. > > I don''t see any bugs filed against this, but I know the search engine is > "differently coded"...And they have all been spawned by: bash-3.2# ps -fp 991 UID PID PPID C STIME TTY TIME CMD root 991 1 1 15:30:40 ? 1:50 /usr/lib/sysevent/syseventconfd I renamed /etc/sysevent/config/SUNW,EC_dev_status,ESC_dev_dle,sysevent.conf and restarted syseventd to stop the madness. Anyone know what has gone so horribly wrong? The other reports I''ve seen were against snv_123, so the current release appears to have the same bug. -- Carson
Carson Gaspar wrote:>> I''m booted back into snv118 (booting with the damaged pool disks >> disconnected so the host would come up without throwing up). After hot >> plugging the disks, I get: >> >> bash-3.2# /usr/sbin/zdb -eud media >> zdb: can''t open media: File exists >> > OK, things are now different (possibly better?): > > bash-3.2# /usr/sbin/zpool status media > pool: media > state: FAULTED > status: One or more devices could not be opened. There are insufficient > replicas for the pool to continue functioning. > action: Attach the missing device and online it using ''zpool online''. > see: http://www.sun.com/msg/ZFS-8000-3C > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > media FAULTED 0 0 1 corrupted data > raidz1 DEGRADED 0 0 6 > c7t5d0 UNAVAIL 0 0 0 cannot open > c7t2d0 ONLINE 0 0 0 > c7t4d0 ONLINE 0 0 0 > c7t3d0 ONLINE 0 0 0 > c7t0d0 ONLINE 0 0 0 > c7t7d0 ONLINE 0 0 0 > c7t1d0 ONLINE 0 0 0 > c7t6d0 ONLINE 0 0 0 > > I suspect that an uberblock rollback might help me - googling all the > references now, but if someone has any advice, I''d be grateful.And I''m afraid I just did something foolish. zdb wasn''t working, so I tried exporting the pool. Now I''m back to: bash-3.2# /usr/sbin/zpool import pool: media id: 4928877878517118807 state: UNAVAIL status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-3C config: media UNAVAIL insufficient replicas raidz1 UNAVAIL insufficient replicas c7t5d0 UNAVAIL cannot open c7t2d0 ONLINE c7t4d0 ONLINE c7t3d0 ONLINE c7t0d0 OFFLINE c7t7d0 ONLINE c7t1d0 ONLINE c7t6d0 ONLINE Can anyone help me get c7t0d0 "ONLINE" and roll back the uberblocks so I can import the pool and save my data? -- Carson
Also can someone tell me if I''m too late for an uberblock rollback to help me? Diffing "zdb -l" output between c7t0 and c7t1 I see: - txg=12968048 + txg=12968082 Is that too large a txg gap to roll back, or is it still possible? Carson Gaspar wrote:> Carson Gaspar wrote: > >>> I''m booted back into snv118 (booting with the damaged pool disks >>> disconnected so the host would come up without throwing up). After >>> hot plugging the disks, I get: >>> >>> bash-3.2# /usr/sbin/zdb -eud media >>> zdb: can''t open media: File exists >>> >> OK, things are now different (possibly better?): >> >> bash-3.2# /usr/sbin/zpool status media >> pool: media >> state: FAULTED >> status: One or more devices could not be opened. There are insufficient >> replicas for the pool to continue functioning. >> action: Attach the missing device and online it using ''zpool online''. >> see: http://www.sun.com/msg/ZFS-8000-3C >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> media FAULTED 0 0 1 corrupted data >> raidz1 DEGRADED 0 0 6 >> c7t5d0 UNAVAIL 0 0 0 cannot open >> c7t2d0 ONLINE 0 0 0 >> c7t4d0 ONLINE 0 0 0 >> c7t3d0 ONLINE 0 0 0 >> c7t0d0 ONLINE 0 0 0 >> c7t7d0 ONLINE 0 0 0 >> c7t1d0 ONLINE 0 0 0 >> c7t6d0 ONLINE 0 0 0 >> >> I suspect that an uberblock rollback might help me - googling all the >> references now, but if someone has any advice, I''d be grateful. > > And I''m afraid I just did something foolish. zdb wasn''t working, so I > tried exporting the pool. Now I''m back to: > > bash-3.2# /usr/sbin/zpool import > pool: media > id: 4928877878517118807 > state: UNAVAIL > status: One or more devices are missing from the system. > action: The pool cannot be imported. Attach the missing > devices and try again. > see: http://www.sun.com/msg/ZFS-8000-3C > config: > > media UNAVAIL insufficient replicas > raidz1 UNAVAIL insufficient replicas > c7t5d0 UNAVAIL cannot open > c7t2d0 ONLINE > c7t4d0 ONLINE > c7t3d0 ONLINE > c7t0d0 OFFLINE > c7t7d0 ONLINE > c7t1d0 ONLINE > c7t6d0 ONLINE > > Can anyone help me get c7t0d0 "ONLINE" and roll back the uberblocks so I > can import the pool and save my data? >