Will Murnane
2009-Aug-04 19:58 UTC
[zfs-discuss] Sol10u7: can''t "zpool remove" missing hot spare
I''m using Solaris 10u6 updated to u7 via patches, and I have a pool with a mirrored pair and a (shared) hot spare. We reconfigured disks a while ago and now the controller is c4 instead of c2. The hot spare was originally on c2, and apparently on rebooting it didn''t get found. So, I looked up what the new name for the hot spare was, then added it to the pool with "zpool add home1 spare c4t19d0". I then tried to remove the original name for the hot spare: root at box:~# zpool remove home1 c2t0d8 root at box:~# zpool status home1 pool: home1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM home1 ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t17d0 ONLINE 0 0 0 c4t24d0 ONLINE 0 0 0 spares c2t0d8 UNAVAIL cannot open c4t19d0 AVAIL errors: No known data errors So, how can I convince the pool to release its grasp on c2t0d8? Thanks! Will
Cindy.Swearingen at Sun.COM
2009-Aug-04 23:05 UTC
[zfs-discuss] Sol10u7: can''t "zpool remove" missing hot spare
Hi Will, It looks to me like you are running into this bug: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6664649 This is fixed in Nevada and a fix will also be available in an upcoming Solaris 10 release. This doesn''t help you now, unfortunately. I don''t think this ghost of a device will cause any problems, but maybe someone has a workaround for removing a device that doesn''t actually exist any more (?) In general, ZFS handles the changing devices okay except for this bug with spares. I believe the pre-reconfiguration workaround is to remove the spares before you reconfigure the devices and then add them back. Cindy On 08/04/09 13:58, Will Murnane wrote:> I''m using Solaris 10u6 updated to u7 via patches, and I have a pool > with a mirrored pair and a (shared) hot spare. We reconfigured disks > a while ago and now the controller is c4 instead of c2. The hot spare > was originally on c2, and apparently on rebooting it didn''t get found. > So, I looked up what the new name for the hot spare was, then added > it to the pool with "zpool add home1 spare c4t19d0". I then tried to > remove the original name for the hot spare: > > root at box:~# zpool remove home1 c2t0d8 > root at box:~# zpool status home1 > pool: home1 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > home1 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c4t17d0 ONLINE 0 0 0 > c4t24d0 ONLINE 0 0 0 > spares > c2t0d8 UNAVAIL cannot open > c4t19d0 AVAIL > > errors: No known data errors > > So, how can I convince the pool to release its grasp on c2t0d8? > > Thanks! > Will > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Will Murnane
2009-Aug-05 00:34 UTC
[zfs-discuss] Sol10u7: can''t "zpool remove" missing hot spare
On Tue, Aug 4, 2009 at 19:05, <Cindy.Swearingen at sun.com> wrote:> Hi Will, > > It looks to me like you are running into this bug: > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6664649 > > This is fixed in Nevada and a fix will also be available in an > upcoming Solaris 10 release.That looks like exactly the problem we hit. Thanks for Googling for me.> This doesn''t help you now, unfortunately.Would it cause problems to temporarily import the pool on an OpenSolaris machine, remove the spare, and move it back to the Sol10 machine? I think it''d be safe provided I don''t do "zpool upgrade" or anything like that, but I''d like to make sure. Thanks, Will
Jordan Schwartz
2009-Aug-05 00:52 UTC
[zfs-discuss] Sol10u7: can''t "zpool remove" missing hot spare
We ran into something similar with controllers changing after a x4500 to x4540 upgrade. In our case the the spares were in a separate data pool so the recovery procedure we developed was relatively easy to implement as long as downtime could be scheduled. You may be able to tweak the procedure to boot off of a jumpstart server or CD/DVD, I believe the trick will be to format the right disks with the rpool exported. For future upgrades where you know the controllers will change, remove the spares ahead of time, then after booting into the new system compare the disks listed in zpool status vs. format to find your spares and re-add them. Hope this helps, Jordan Procedure to recover invalid or corrupted spare after X4500 to X4540 SC upgrade. 1. The data zpool, z, was exported and the server (X4500) was shutdown. 2. In this case after the SC was upgraded to a 4540, new root disks were installed with Solaris 10 Update 6 and the following patches were installed, including a ZFS patch. 126420-02 138286-02 139387-02 139580-02 140176-01 140191-01 139463-01 139467-04 138889-07 3. After the X4540 was booted, the z zpool was imported. 4. zpool status z was run and the invalid spares were listed as c6t0d0 and c6t4d0. Reviewing the "echo | format" output it was obvious that c6 no longer existed, c0 through c5 became the naming of the 6 channels in the X4540 after the os upgrade. 4. The following awk one liner was used to list all the disks being used by zfs, and this was compared to the list of all know disks in the system, zpool status | awk ''$1 ~/^c[0-9]/ {print $1}'' | sort the disks c4t0d0 and c4t1d0 were listed in format but not in the zpool output. 5. Attempts to zpool add or replace the disks failed with an error stating that the disks were spares in the z zpool. 6 The z zpool was exported 7. format was run on c4t0d0 and c4t1d0. In the fdisk menu the partition 1 was removed and the information was saved. 8. The z zpool was then imported. 9. The failed spares were removed from the zpool. 10. The "missing" disks were readded as spares, example: zpool add z spare c4t1d0 . On Tue, Aug 4, 2009 at 5:34 PM, Will Murnane <will.murnane at gmail.com> wrote:> On Tue, Aug 4, 2009 at 19:05, <Cindy.Swearingen at sun.com> wrote: > > Hi Will, > > > > It looks to me like you are running into this bug: > > > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6664649 > > > > This is fixed in Nevada and a fix will also be available in an > > upcoming Solaris 10 release. > That looks like exactly the problem we hit. Thanks for Googling for me. > > > This doesn''t help you now, unfortunately. > Would it cause problems to temporarily import the pool on an > OpenSolaris machine, remove the spare, and move it back to the Sol10 > machine? I think it''d be safe provided I don''t do "zpool upgrade" or > anything like that, but I''d like to make sure. > > Thanks, > Will > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090804/77861b94/attachment.html>
Cindy.Swearingen at Sun.COM
2009-Aug-05 14:58 UTC
[zfs-discuss] Sol10u7: can''t "zpool remove" missing hot spare
Hi Will, Since no workaround is provided in the CR, I don''t know if importing on a more recent OpenSolaris release and trying to remove it will work. I will simulate this error, try this approach, and get back to you. Thanks, Cindy On 08/04/09 18:34, Will Murnane wrote:> On Tue, Aug 4, 2009 at 19:05, <Cindy.Swearingen at sun.com> wrote: > >>Hi Will, >> >>It looks to me like you are running into this bug: >> >>http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6664649 >> >>This is fixed in Nevada and a fix will also be available in an >>upcoming Solaris 10 release. > > That looks like exactly the problem we hit. Thanks for Googling for me. > > >>This doesn''t help you now, unfortunately. > > Would it cause problems to temporarily import the pool on an > OpenSolaris machine, remove the spare, and move it back to the Sol10 > machine? I think it''d be safe provided I don''t do "zpool upgrade" or > anything like that, but I''d like to make sure. > > Thanks, > Will
Cindy.Swearingen at Sun.COM
2009-Aug-05 19:36 UTC
[zfs-discuss] Sol10u7: can''t "zpool remove" missing hot spare
Hi Will, I simulated this issue on s10u7 and then imported the pool on a current Nevada release. The original issue remains, which is you can''t remove a spare device that no longer exists. My sense is that the bug fix prevents the spare from getting messed up in the first place when the device IDs change, but after the original device is removed, you can''t remove the spare. I think the only resolution is to put the device back and then you can remove the spare. This was my resolution during testing. But, in your case, the original device is renamed. I don''t think the ghost spare causes a problem except aesthetically. I''m no expert in this error scenario so I will check with someone else (when he gets back from vacation and then I''m on vacation). Thanks, Cindy On 08/04/09 18:34, Will Murnane wrote:> On Tue, Aug 4, 2009 at 19:05, <Cindy.Swearingen at sun.com> wrote: > >>Hi Will, >> >>It looks to me like you are running into this bug: >> >>http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6664649 >> >>This is fixed in Nevada and a fix will also be available in an >>upcoming Solaris 10 release. > > That looks like exactly the problem we hit. Thanks for Googling for me. > > >>This doesn''t help you now, unfortunately. > > Would it cause problems to temporarily import the pool on an > OpenSolaris machine, remove the spare, and move it back to the Sol10 > machine? I think it''d be safe provided I don''t do "zpool upgrade" or > anything like that, but I''d like to make sure. > > Thanks, > Will
Kyle McDonald
2009-Aug-05 20:04 UTC
[zfs-discuss] Sol10u7: can''t "zpool remove" missing hot spare
Will Murnane wrote:> I''m using Solaris 10u6 updated to u7 via patches, and I have a pool > with a mirrored pair and a (shared) hot spare. We reconfigured disks > a while ago and now the controller is c4 instead of c2. The hot spare > was originally on c2, and apparently on rebooting it didn''t get found. > So, I looked up what the new name for the hot spare was, then added > it to the pool with "zpool add home1 spare c4t19d0". I then tried to > remove the original name for the hot spare: > > root at box:~# zpool remove home1 c2t0d8 > root at box:~# zpool status home1 > pool: home1 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > home1 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c4t17d0 ONLINE 0 0 0 > c4t24d0 ONLINE 0 0 0 > spares > c2t0d8 UNAVAIL cannot open > c4t19d0 AVAIL > > errors: No known data errors > > So, how can I convince the pool to release its grasp on c2t0d8? > >Have you tried making a sparse file with mkfile in /tmp and then ZFS replace''ing c2t0d8 with the file, and then zfs remove''ing the file? I don''t know if it will work, but at least at the time of the remove, the device will exist. -Kyle> Thanks! > Will > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Cindy.Swearingen at Sun.COM
2009-Aug-06 21:20 UTC
[zfs-discuss] Sol10u7: can''t "zpool remove" missing hot spare
Hi Kyle, Except that in the case of spares, you can''t replace them. You''ll see a message like the one below. Cindy # zpool create pool mirror c1t0d0 c1t1d0 spare c1t5d0 # zpool status pool: pool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 spares c1t5d0 AVAIL # zpool replace pool c1t5d0 c2t5d0 cannot replace c1t5d0 with c2t5d0: device is reserved as a hot spare On 08/05/09 14:04, Kyle McDonald wrote:> Will Murnane wrote: > >> I''m using Solaris 10u6 updated to u7 via patches, and I have a pool >> with a mirrored pair and a (shared) hot spare. We reconfigured disks >> a while ago and now the controller is c4 instead of c2. The hot spare >> was originally on c2, and apparently on rebooting it didn''t get found. >> So, I looked up what the new name for the hot spare was, then added >> it to the pool with "zpool add home1 spare c4t19d0". I then tried to >> remove the original name for the hot spare: >> >> root at box:~# zpool remove home1 c2t0d8 >> root at box:~# zpool status home1 >> pool: home1 >> state: ONLINE >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> home1 ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> c4t17d0 ONLINE 0 0 0 >> c4t24d0 ONLINE 0 0 0 >> spares >> c2t0d8 UNAVAIL cannot open >> c4t19d0 AVAIL >> >> errors: No known data errors >> >> So, how can I convince the pool to release its grasp on c2t0d8? >> >> > > Have you tried making a sparse file with mkfile in /tmp and then ZFS > replace''ing c2t0d8 with the file, and then zfs remove''ing the file? > > I don''t know if it will work, but at least at the time of the remove, > the device will exist. > > -Kyle > >> Thanks! >> Will >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss