Zpool split is a wonderful feature and it seems to work well, and the choice of which disk got which name was perfect! But there seems to be an odd anomaly (at least with b132) . Started with c0t1d0s0 running b132 (root pool is called rpool) Attached c0t0d0s0 and waited for it to resilver Rebooted from c0t0d0s0 zpool split rpool spool Rebooted from c0t0d0s0, both rpool and spool were mounted Rebooted from c0t1d0s0, only rpool was mounted It seems to me for consistency rpool should not have been mounted when booting from c0t0d0s0; however that''s pretty harmless. But: Rebooted from c0t0d0s0 - a couple of verbose errors on the console... # zpool status rpool pool: rpool state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM rpool UNAVAIL 0 0 0 insufficient replicas mirror-0 UNAVAIL 0 0 0 insufficient replicas c0t1d0s0 FAULTED 0 0 0 corrupted data c0t0d0s0 FAULTED 0 0 0 corrupted data # zpool status spool pool: spool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM spool ONLINE 0 0 0 c0t0d0s0 ONLINE 0 0 0 It seems that ZFS thinks c0t0d0s0 is still part of rpool as well as being a separate pool (spool). # zpool export rpool cannot open ''rpool'': I/O error This worked since zpool list doesn''t show rpool any more. Reboot c0t1d0s0 - no problem (no spool) Reboot c0t0d0s0 - no problem (no rpool) The workaround seems to be to export rpool the first time you boot c0t0d0s0. No big deal but it''s a bit scary when it happens. Has this been fixed in a later release? Thanks -- Frank
On Sat, 27 Mar 2010, Frank Middleton wrote:> Started with c0t1d0s0 running b132 (root pool is called rpool) > Attached c0t0d0s0 and waited for it to resilver > Rebooted from c0t0d0s0 > zpool split rpool spool > Rebooted from c0t0d0s0, both rpool and spool were mounted > Rebooted from c0t1d0s0, only rpool was mountedOK, I will try to reproduce that here and see what happens. Regards, markm
OK, I see what the problem is: the /etc/zfs/zpool.cache file. When the pool was split, the zpool.cache file was also split - and the split happens prior to the config file being updated. So, after booting off the split side of the mirror, zfs attempts to mount rpool based on the information in the zpool.cache file (which still shows it as a mirror of c0t0d0s0 and c0t1d0s0). The fix would be to remove the appropriate entry from the split-off pool''s zpool.cache file. Easy to say, not so easy to do. I have filed CR 6939334 to track this issue. -- This message posted from opensolaris.org
Why do we still need "/etc/zfs/zpool.cache" file??? (I could understand it was useful when zfs import was slow) zpool import is now multi-threaded (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844191), hence a lot faster, each disk contains the hostname (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6282725) , if a pool contains the same hostname as the server then import it. ie This bug should not be a problem any more http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6737296 with a multi-threaded zpool import. HA Storage should be changed to just do a zpool -h import mypool instead of using a private zpool.cache file (-h being ignore if the pool was imported by a different host, and maybe a noautoimport property is need on a zpool so clustering software can decided to import it by hand as it was) And therefore this zpool zplit problem would be fixed. -- This message posted from opensolaris.org
On Wed, 31 Mar 2010, Damon Atkins wrote:> Why do we still need "/etc/zfs/zpool.cache" file???The cache file contains a list of pools to import, not a list of pools that exist. If you do a "zpool export foo" and then reboot, we don''t want foo to be imported after boot completes. Unfortunately, the problem goes well beyond just zpool.cache. There are several configuration files which use the pool name (e.g. dumpadm, vfstab for swap), not to mention beadm configuration for opensolaris or live upgrade configuration for Solaris 10. To solve it properly would require not an insignificant amount of work. It''s solvable, yes, but it''s well beyond the scope of the original "zpool split" putback. Regards, markm
On 03/31/10 03:50 AM, Damon Atkins wrote:> Why do we still need "/etc/zfs/zpool.cache" file??? > (I could understand it was useful when zfs import was slow) > > zpool import is now multi-threaded (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844191), hence a lot faster, each disk contains the hostname (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6282725) , if a pool contains the same hostname as the server then import it. > > ie This bug should not be a problem any more http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6737296 with a multi-threaded zpool import. > > HA Storage should be changed to just do a zpool -h import mypool instead of using a private zpool.cache file (-h being ignore if the pool was imported by a different host, and maybe a noautoimport property is need on a zpool so clustering software can decided to import it by hand as it was) > > And therefore this zpool zplit problem would be fixed. >The problem with splitting a root pool goes beyond the issue of the zpool.cache file. If you look at the comments for 6939334 <http://monaco.sfbay.sun.com/detail.jsf?cr=6939334>, you will see other files whose content is not correct when a root pool is renamed or split. I''m not questioning your logic about whether zpool.cache is still needed. I''m only pointing out that eliminating the zpool.cache file would not enable root pools to be split. More work is required for that. Lori -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100331/e835d80c/attachment.html>
On 03/31/10 03:50 AM, Damon Atkins wrote:> Why do we still need "/etc/zfs/zpool.cache" file??? > (I could understand it was useful when zfs import was slow) > > zpool import is now multi-threaded (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844191), hence a lot faster, each disk contains the hostname (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6282725) , if a pool contains the same hostname as the server then import it. > > ie This bug should not be a problem any more http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6737296 with a multi-threaded zpool import. > > HA Storage should be changed to just do a zpool -h import mypool instead of using a private zpool.cache file (-h being ignore if the pool was imported by a different host, and maybe a noautoimport property is need on a zpool so clustering software can decided to import it by hand as it was) > > And therefore this zpool zplit problem would be fixed. >The problem with splitting a root pool goes beyond the issue of the zpool.cache file. If you look at the comments for 6939334 <http://monaco.sfbay.sun.com/detail.jsf?cr=6939334>, you will see other files whose content is not correct when a root pool is renamed or split. I''m not questioning your logic about whether zpool.cache is still needed. I''m only pointing out that eliminating the zpool.cache file would not enable root pools to be split. More work is required for that. Lori -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100331/fe934b17/attachment.html>
On Mar 31, 2010, at 2:50 AM, Damon Atkins wrote:> Why do we still need "/etc/zfs/zpool.cache" file??? > (I could understand it was useful when zfs import was slow)Yes. Imagine the case where your server has access to hundreds of LUs. If you must probe each one, then booting can take a long time. If you go back in history you will find many cases where probing all LUs at boot was determined to be a bad thing.> zpool import is now multi-threaded (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844191), hence a lot faster, each disk contains the hostname (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6282725) , if a pool contains the same hostname as the server then import it. > > ie This bug should not be a problem any more http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6737296 with a multi-threaded zpool import. > > HA Storage should be changed to just do a zpool -h import mypool instead of using a private zpool.cache file (-h being ignore if the pool was imported by a different host, and maybe a noautoimport property is need on a zpool so clustering software can decided to import it by hand as it was) > > And therefore this zpool zplit problem would be fixed.There is also a use case where the storage array makes a block-level copy of a LU. It would be a bad thing to discover that on a probe and attempt import. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
On 03/31/10 12:21 PM, lori.alt wrote:> The problem with splitting a root pool goes beyond the issue of the > zpool.cache file. If you look at the comments for 6939334 > <http://monaco.sfbay.sun.com/detail.jsf?cr=6939334>, you will see other > files whose content is not correct when a root pool is renamed or split.6939334 seems to be inaccessible outside of Sun. Could you list the comments here? Thanks
On 03/31/10 10:42 AM, Frank Middleton wrote:> On 03/31/10 12:21 PM, lori.alt wrote: > >> The problem with splitting a root pool goes beyond the issue of the >> zpool.cache file. If you look at the comments for 6939334 >> <http://monaco.sfbay.sun.com/detail.jsf?cr=6939334>, you will see other >> files whose content is not correct when a root pool is renamed or split. > > 6939334 seems to be inaccessible outside of Sun. Could you > list the comments here? > > Thanks >Here they are:> Other issues: > > * Swap is still pointing to rpool because /etc/vfstab is never updated. > > * Likewise, dumpadm still has dump zvols configured with the original pool. > > * The /{pool}/boot/menu.lst (on sparc), and /{pool}/boot/grub/menu.lst (on x86) still reference the original pool''s bootfs. Note that the ''bootfs'' property in the pool itself is actually correct, because we store the object number and not the name. > > > While each one of these issues is individually fixable, there''s no way to prevent new issues coming up in the future, thus breaking zpool split. It might be more advisable to prevent splitting of root pools. > > *** (#2 of 3): 2010-03-30 18:48:54 GMT+00:00mark.musante at sun.com > > yes, these looks like the kind of issues that flash archive install had to solve: all the tweaks that need to be made to a root file system to get it to adjust to living on different hardware. In addition to the ones listed above, there are all the device specific files in /etc/path_to_inst, /devices, and so on. This is not a trivial problem. Cloning root pools by the split mechanism is more of a project in its own right. Is zfs split good for anything related to root disks? I can''t think of a use. If there is a need for a disaster recovery disk, it''s probably best to just remove one of the mirrors (without doing a split operation) and stash it for later use. > > *** (#3 of 3): 2010-03-30 20:21:57 GMT+00:00lori.alt at sun.com > > >Lori
I assume the swap, dumpadm, grub is because the pool has a different name now, but is it still a problem if you take it to a *different system* boot off a CD change it back to rpool. (which is most likley unsupported, ie no help to get it working) Over 10 years ago (way before flash archive existed) I developed a script, used after spliting a mirror, which would remove most of the device tree, cleaned up path_to_inst etc so it look like the OS was just installed and about to do the reboot without the install CD. (every thing was still in there expect for hardware specific stuff, I no longer have the script and most likey would not do it again because its not a supported install method) I still had to boot from CD on the new system and create the dev tree before booting off the disk for the first time, and then fix vfstab (but the fix vfstab should be gone with zfs rpool) It would be nice for Oracle/Sun to produce a separate script which reset system/devices back to a install like begining so if you move a OS disk with current password file and software from one system to another, and have it rebuild the device tree on the new system.>From member (updated for zfs) something like:zfs split rpool newrpool mount newrpool remove newrpool/dev and newrpool/devices of all non-packages content (ie dynamically created content) clean up newrpool/etc/path_to_inst create /newrool/reconfigure remove all prevoius snapshots in newrool update beadm info inside newrpool ensure grub is installed on the disk -- This message posted from opensolaris.org
On 31/03/2010 16:19, Mark J Musante wrote:> On Wed, 31 Mar 2010, Damon Atkins wrote: > >> Why do we still need "/etc/zfs/zpool.cache" file??? > > The cache file contains a list of pools to import, not a list of pools > that exist. If you do a "zpool export foo" and then reboot, we don''t > want foo to be imported after boot completes. > > Unfortunately, the problem goes well beyond just zpool.cache. There are > several configuration files which use the pool name (e.g. dumpadm, > vfstab for swap), not to mention beadm configuration for opensolaris or > live upgrade configuration for Solaris 10.beadm doesn''t seem to care since I don''t believe it stores the pool names anywhere. Live upgrade on the other hand does, as do all the other issues you highlighted. -- Darren J Moffat
> It would be nice for Oracle/Sun to produce a separate > script which reset system/devices back to a install > like beginning so if you move a OS disk with current > password file and software from one system to > another, and have it rebuild the device tree on the > new system.You mean /usr/sbin/sys-unconfig? -- This message posted from opensolaris.org
You might want to take this issue over to caiman-discuss at opensolaris.org, because this is more of an installation/management issue than a zfs issue. Other than providing a mechanism for updating the zpool.cache file, the actions listed below are not directly related to zfs. I believe that the Caiman team is looking at implementing a mass provisioning and disaster recovery mechanism (functionally similar to flash archives in the legacy Solaris installer). Pool splitting could be another tool in their toolbox for accomplishing that goal. Lori The zfs tea On 03/31/10 06:41 PM, Damon Atkins wrote:> I assume the swap, dumpadm, grub is because the pool has a different name now, but is it still a problem if you take it to a *different system* boot off a CD change it back to rpool. (which is most likley unsupported, ie no help to get it working) > > Over 10 years ago (way before flash archive existed) I developed a script, used after spliting a mirror, which would remove most of the device tree, cleaned up path_to_inst etc so it look like the OS was just installed and about to do the reboot without the install CD. (every thing was still in there expect for hardware specific stuff, I no longer have the script and most likey would not do it again because its not a supported install method) > > I still had to boot from CD on the new system and create the dev tree before booting off the disk for the first time, and then fix vfstab (but the fix vfstab should be gone with zfs rpool) > > It would be nice for Oracle/Sun to produce a separate script which reset system/devices back to a install like begining so if you move a OS disk with current password file and software from one system to another, and have it rebuild the device tree on the new system. > > From member (updated for zfs) something like: > zfs split rpool newrpool > mount newrpool > remove newrpool/dev and newrpool/devices of all non-packages content (ie dynamically created content) > clean up newrpool/etc/path_to_inst > create /newrool/reconfigure > remove all prevoius snapshots in newrool > update beadm info inside newrpool > ensure grub is installed on the disk >
> You mean /usr/sbin/sys-unconfig?No, it does not reset a system back far enough. You still left with the orginal path_to_inst and the device tree. e.g. take a disk to a different system and the first disk might end up being sd10 and c15t0d0s0 instead of sd0 and c0 without cleaning up the system first. ie. removing /etc/path_to_inst and most of what is in the device tree. -- This message posted from opensolaris.org
You might want to take this issue over to caiman-discuss at opensolaris.org, because this is more of an installation/management issue than a zfs issue. Other than providing a mechanism for updating the zpool.cache file, the actions listed below are not directly related to zfs. I believe that the Caiman team is looking at implementing a mass provisioning and disaster recovery mechanism (functionally similar to flash archives in the legacy Solaris installer). Pool splitting could be another tool in their toolbox for accomplishing that goal. Lori On 03/31/10 06:41 PM, Damon Atkins wrote:> I assume the swap, dumpadm, grub is because the pool has a different name now, but is it still a problem if you take it to a *different system* boot off a CD change it back to rpool. (which is most likley unsupported, ie no help to get it working) > > Over 10 years ago (way before flash archive existed) I developed a script, used after spliting a mirror, which would remove most of the device tree, cleaned up path_to_inst etc so it look like the OS was just installed and about to do the reboot without the install CD. (every thing was still in there expect for hardware specific stuff, I no longer have the script and most likey would not do it again because its not a supported install method) > > I still had to boot from CD on the new system and create the dev tree before booting off the disk for the first time, and then fix vfstab (but the fix vfstab should be gone with zfs rpool) > > It would be nice for Oracle/Sun to produce a separate script which reset system/devices back to a install like begining so if you move a OS disk with current password file and software from one system to another, and have it rebuild the device tree on the new system. > > From member (updated for zfs) something like: > zfs split rpool newrpool > mount newrpool > remove newrpool/dev and newrpool/devices of all non-packages content (ie dynamically created content) > clean up newrpool/etc/path_to_inst > create /newrool/reconfigure > remove all prevoius snapshots in newrool > update beadm info inside newrpool > ensure grub is installed on the disk >
>>>>> "la" == Lori Alt <lori.alt at oracle.com> writes:la> I''m only pointing out that eliminating the zpool.cache file la> would not enable root pools to be split. More work is la> required for that. makes sense. All the same, please do not retaliate against the bug-opener by adding a lazy-assertion to prevent rpools from being split: this type of brittleness, ex. around all the many disk-labeling programs, is a large part of what makes Solaris systems feel flakey and unwelcoming to those who''ve used Linux, BSD, or Mac OS X. and AFAICT there is not much of it in the ZFS boot support so far---it''s an uncluttered architecture that''s quite friendly to creative abuse and impatient hacking. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100401/b694c31a/attachment.bin>