thr3ads.net - zfs discuss - [zfs-discuss] How to properly tell zfs of new GUID <controller numbers> after a firmware upgrade changes the IDs [Dec 2007]

If this information is useful, please help other people find it:
Share via:

Jill Manfield

2007-Dec-13 17:16 UTC

[zfs-discuss] How to properly tell zfs of new GUID <controller numbers> after a firmware upgrade changes the IDs

My customer''s zfs pools and their 6540 disk array had a firmware
upgrade that changed GUIDs so we need a procedure to let the zfs know it
changed. They are getting errors as if they replaced drives.  But I need to make
sure you know they have not "replaced" any drives, and no drives have
failed or are "bad". As such, they have no interest in wiping any
disks clean as indicated in 88130 info doc.

Some background from customer:

We have a large 6540 disk array, on which we have configured a series of
large RAID luns.  A few days ago, Sun sent a technician to upgrade the
firmware of this array, which worked fine but which had the deleterious
effect of changing the "Volume IDs" associated with each lun.  So, the
resulting luns now appear to our solaris 10 host (under mpxio) as disks in
/dev/rdsk with different ''target'' components than they had
before.

Before the firmware upgrade we took the precaution of creating duplicate
luns on a different 6540 disk array, and using these to mirror each of our
zfs pools (as protection in case the firmware upgrade corrupted our luns).

Now, we simply want to ask zfs to find the devices under their new
targets, recognize that they are existing zpool components, and have it
correct the configuration of each pool.  This would be similar to having
Veritas vxvm re-scan all disks with vxconfigd in the event of a
"controller renumbering" event.

The proper zfs method for doing this, I believe, is to simply do:

zpool export mypool
zpool import mypool

Indeed, this has worked fine for me a few times today, and several of our
pools are now back to their original mirrored configuration.

Here is a specific example, for the pool "ospf".

The zpool status after the upgrade:

diamond:root[1105]->zpool status ospf
  pool: ospf
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas
exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using ''zpool
online''.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver completed with 0 errors on Tue Dec 11 18:26:53 2007
config:

        NAME                                        STATE     READ WRITE CKSUM
        ospf                                        DEGRADED     0     0     0
          mirror                                    DEGRADED     0     0     0
            c27t600A0B8000292B0200004BDC4731A7B8d0  UNAVAIL      0     0     0 
cannot open
            c27t600A0B800032619A0000093747554A08d0  ONLINE       0     0     0

errors: No known data errors

This is due to the fact that the LUN which used to appear as
c27t600A0B8000292B0200004BDC4731A7B8d0 is now actually
c27t600A0B8000292B0200004D5B475E6E90d0.  It''s the same LUN, but since
the
firmware changed the Volume ID, the target portion is different.

Rather than treating this as a "replaced" disk (which would incur an
entire mirror resilvering, and would require the "trick" you sent of
obliterating the disk label so the "in use" safeguard could be
avoided),
we simply want to ask zfs to re-read its configuration to find this disk.

So we do this:

diamond:root[1110]->zpool export -f ospf
diamond:root[1111]->zpool import ospf

and sure enough:

diamond:root[1112]->zpool status ospf
  pool: ospf
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 0.16% done, 2h53m to go
config:

        NAME                                        STATE     READ WRITE CKSUM
        ospf                                        ONLINE       0     0     0
          mirror                                    ONLINE       0     0     0
            c27t600A0B8000292B0200004D5B475E6E90d0  ONLINE       0     0     0
            c27t600A0B800032619A0000093747554A08d0  ONLINE       0     0     0

errors: No known data errors

(Note that it has self-initiated a resilvering, since in this case the
mirror has been changed by users since the firmware upgrade.)

The problem that Robert had was that when he initiated an export of a pool
(called "bgp") it froze for quite some time.  The corresponding
"import"
of the same pool took 12 hours to complete.  I have not been able to
replicate this myself, but that was the essence of the problem.

So again, we do NOT want to "zero out" any of our disks, we are not
trying
to forcibly use "replaced" disks.  We simply wanted zfs to re-read the
devices under /dev/rdsk and update each pool with the correct disk
targets.

If you can confirm that a simple export/import is the proper procedure for
this (followed by a "clear" once the resulting resilvering finishes),
I
would appreciate it.  And, if you can postulate what may have caused the
"freeze" that Robert noticed, that would put our minds at ease.



TIA,

Any assistance on this would be greatly appreciated and or pointers on helpful
documentation.

-- 
	S U N  M I C R O S Y S T E M S  I N C.
           
       	Jill Manfield - TSE-OS Administration Group
       	email: jill.manfield at sun.com
       	phone: (800)USA-4SUN (Reference your case number)
       	address:  1617 Southwood Drive Nashua,NH 03063
       	mailstop: NSH-01- B287    
	
       	OS Support Team     9AM to 6PM EST
        Manager  joel.fontenot at sun.com  x74110

Shawn Ferry

2007-Dec-13 17:37 UTC

head link

[zfs-discuss] How to properly tell zfs of new GUID <controller numbers> after a firmware upgrade changes the IDs

Jill,

I was recently looking for a similar solution to try and reconnect a
renumbered device while the pool was live.

e.g. zpool online mypool <old target> <old target at new location>
As in zpool replace but with the indication that this isn''t a new  
device.

What I have been doing to deal with the renumbering is exactly the
export, import and clear.  Although I have been dealing with  
significantly
smaller devices and can''t speak to the delay issues.

Shawn



On Dec 13, 2007, at 12:16 PM, Jill Manfield wrote:
>
> My customer''s zfs pools and their 6540 disk array had a firmware  
> upgrade that changed GUIDs so we need a procedure to let the zfs  
> know it changed. They are getting errors as if they replaced  
> drives.  But I need to make sure you know they have not
"replaced"
> any drives, and no drives have failed or are "bad". As such, they
> have no interest in wiping any disks clean as indicated in 88130  
> info doc.
>
> Some background from customer:
>
> We have a large 6540 disk array, on which we have configured a  
> series of
> large RAID luns.  A few days ago, Sun sent a technician to upgrade the
> firmware of this array, which worked fine but which had the  
> deleterious
> effect of changing the "Volume IDs" associated with each lun. 
So, the
> resulting luns now appear to our solaris 10 host (under mpxio) as  
> disks in
> /dev/rdsk with different ''target'' components than they
had before.
>
> Before the firmware upgrade we took the precaution of creating  
> duplicate
> luns on a different 6540 disk array, and using these to mirror each  
> of our
> zfs pools (as protection in case the firmware upgrade corrupted our  
> luns).
>
> Now, we simply want to ask zfs to find the devices under their new
> targets, recognize that they are existing zpool components, and have  
> it
> correct the configuration of each pool.  This would be similar to  
> having
> Veritas vxvm re-scan all disks with vxconfigd in the event of a
> "controller renumbering" event.
>
> The proper zfs method for doing this, I believe, is to simply do:
>
> zpool export mypool
> zpool import mypool
>
> Indeed, this has worked fine for me a few times today, and several  
> of our
> pools are now back to their original mirrored configuration.
>
> Here is a specific example, for the pool "ospf".
>
> The zpool status after the upgrade:
>
> diamond:root[1105]->zpool status ospf
>  pool: ospf
> state: DEGRADED
> status: One or more devices could not be opened.  Sufficient replicas
> exist for
>        the pool to continue functioning in a degraded state.
> action: Attach the missing device and online it using ''zpool
online''.
>   see: http://www.sun.com/msg/ZFS-8000-D3
> scrub: resilver completed with 0 errors on Tue Dec 11 18:26:53 2007
> config:
>
>        NAME                                        STATE     READ  
> WRITE CKSUM
>        ospf                                        DEGRADED      
> 0     0     0
>          mirror                                    DEGRADED      
> 0     0     0
>            c27t600A0B8000292B0200004BDC4731A7B8d0  UNAVAIL       
> 0     0     0  cannot open
>            c27t600A0B800032619A0000093747554A08d0  ONLINE        
> 0     0     0
>
> errors: No known data errors
>
> This is due to the fact that the LUN which used to appear as
> c27t600A0B8000292B0200004BDC4731A7B8d0 is now actually
> c27t600A0B8000292B0200004D5B475E6E90d0.  It''s the same LUN, but  
> since the
> firmware changed the Volume ID, the target portion is different.
>
> Rather than treating this as a "replaced" disk (which would incur
an
> entire mirror resilvering, and would require the "trick" you sent
of
> obliterating the disk label so the "in use" safeguard could be  
> avoided),
> we simply want to ask zfs to re-read its configuration to find this  
> disk.
>
> So we do this:
>
> diamond:root[1110]->zpool export -f ospf
> diamond:root[1111]->zpool import ospf
>
> and sure enough:
>
> diamond:root[1112]->zpool status ospf
>  pool: ospf
> state: ONLINE
> status: One or more devices is currently being resilvered.  The pool  
> will
>        continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scrub: resilver in progress, 0.16% done, 2h53m to go
> config:
>
>        NAME                                        STATE     READ  
> WRITE CKSUM
>        ospf                                        ONLINE        
> 0     0     0
>          mirror                                    ONLINE        
> 0     0     0
>            c27t600A0B8000292B0200004D5B475E6E90d0  ONLINE        
> 0     0     0
>            c27t600A0B800032619A0000093747554A08d0  ONLINE        
> 0     0     0
>
> errors: No known data errors
>
> (Note that it has self-initiated a resilvering, since in this case the
> mirror has been changed by users since the firmware upgrade.)
>
> The problem that Robert had was that when he initiated an export of  
> a pool
> (called "bgp") it froze for quite some time.  The corresponding  
> "import"
> of the same pool took 12 hours to complete.  I have not been able to
> replicate this myself, but that was the essence of the problem.
>
> So again, we do NOT want to "zero out" any of our disks, we are
not
> trying
> to forcibly use "replaced" disks.  We simply wanted zfs to
re-read the
> devices under /dev/rdsk and update each pool with the correct disk
> targets.
>
> If you can confirm that a simple export/import is the proper  
> procedure for
> this (followed by a "clear" once the resulting resilvering  
> finishes), I
> would appreciate it.  And, if you can postulate what may have caused  
> the
> "freeze" that Robert noticed, that would put our minds at ease.
>
>
>
> TIA,
>
> Any assistance on this would be greatly appreciated and or pointers  
> on helpful documentation.
>
> -- 
> 	S U N  M I C R O S Y S T E M S  I N C.
>
>       	Jill Manfield - TSE-OS Administration Group
>       	email: jill.manfield at sun.com
>       	phone: (800)USA-4SUN (Reference your case number)
>       	address:  1617 Southwood Drive Nashua,NH 03063
>       	mailstop: NSH-01- B287
> 	
>       	OS Support Team     9AM to 6PM EST
>        Manager  joel.fontenot at sun.com  x74110
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Shawn Ferry              shawn.ferry at sun.com
Senior Primary Systems Engineer
Sun Managed Operations
571.291.4898

zfs discuss - Dec 2007 - How to properly tell zfs of new GUID <controller numbers> after a firmware upgrade changes the IDs

[zfs-discuss] How to properly tell zfs of new GUID <controller numbers> after a firmware upgrade changes the IDs

[zfs-discuss] How to properly tell zfs of new GUID <controller numbers> after a firmware upgrade changes the IDs