Paul B. Henson
2009-Aug-28  01:20 UTC
[zfs-discuss] live upgrade with lots of zfs filesystems
Well, so I''m getting ready to install the first set of patches on my x4500 since we deployed into production, and have run into an unexpected snag. I already knew that with about 5-6k file systems the reboot cycle was going to be over an hour (not happy about, but knew about and planned for). However, I went to create a new boot environment to install the patches into, and so far that''s been running for about an hour and a half :(, which was not expected or planned for. First, it looks like the ludefine script spent about 20 minutes iterating through all of my zfs file systems, and then something named lupi_bebasic ran for over an hour, and then it looks like it mounted all of my zfs filesystems under /.alt.tmp.b-nAe.mnt, and now it looks like it is unmounting all of them. I hadn''t noticed before, but when I went to check on my test system (with only a handful of filesystems), but evidently when I get to the point of using lumount to mount the boot environment for patching, it''s going to again mount all of my zfs file systems under the alternative root, and then need to unmount them all again after I''m done patching, which is going to add probably another hour or two. I don''t think I''m going to make my downtime window :(, and will probably need to reschedule the patching. I never considered I might have to start the patch process six hours before the window. I poked around a bit, but have not come across any way to exclude zfs filesystems not part of the boot os pool from the copy and mount process. I''m really hoping I''m just being stupid and missing something blindingly obvious. Given a boot pool named ospool, and a data pool named export, is there anyway to make live upgrade completely ignore the data pool? There is no need for my 6k user file systems to be mounted in the alternative environment during patching. I only want the file systems in the ospool copied, processed, and mounted. <fingers crossed> Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul You need to exclude all the file system that are not the "OS" My S10 Virtual machine is not booted but you can put all the "excluded" file systems in a file and use -f from memory. You use to have to do this if there was a DVD in the drive otherwise /cdrom got copied to the new boot environment. I know this because I logged an RFE when Live Upgrade first appeared, and it was put into state Deferred as the workaround is to just exclude it. I think it did get fixed however in a later release. trevor Paul B. Henson wrote: Well, so I''m getting ready to install the first set of patches on my x4500 since we deployed into production, and have run into an unexpected snag. I already knew that with about 5-6k file systems the reboot cycle was going to be over an hour (not happy about, but knew about and planned for). However, I went to create a new boot environment to install the patches into, and so far that''s been running for about an hour and a half :(, which was not expected or planned for. First, it looks like the ludefine script spent about 20 minutes iterating through all of my zfs file systems, and then something named lupi_bebasic ran for over an hour, and then it looks like it mounted all of my zfs filesystems under /.alt.tmp.b-nAe.mnt, and now it looks like it is unmounting all of them. I hadn''t noticed before, but when I went to check on my test system (with only a handful of filesystems), but evidently when I get to the point of using lumount to mount the boot environment for patching, it''s going to again mount all of my zfs file systems under the alternative root, and then need to unmount them all again after I''m done patching, which is going to add probably another hour or two. I don''t think I''m going to make my downtime window :(, and will probably need to reschedule the patching. I never considered I might have to start the patch process six hours before the window. I poked around a bit, but have not come across any way to exclude zfs filesystems not part of the boot os pool from the copy and mount process. I''m really hoping I''m just being stupid and missing something blindingly obvious. Given a boot pool named ospool, and a data pool named export, is there anyway to make live upgrade completely ignore the data pool? There is no need for my 6k user file systems to be mounted in the alternative environment during patching. I only want the file systems in the ospool copied, processed, and mounted. Thanks... www.eagle.co.nz This email is confidential and may be legally privileged. If received in error please destroy and immediately notify us. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Paul B. Henson
2009-Aug-28  02:25 UTC
[zfs-discuss] live upgrade with lots of zfs filesystems
On Thu, 27 Aug 2009, Trevor Pretty wrote:> My S10 Virtual machine is not booted but you can put all the "excluded" > file systems in a file and use -f from memory.Unfortunately, I wasn''t that stupid. I saw the -f option, but it''s not applicable to ZFS root: -f exclude_list_file Use the contents of exclude_list_file to exclude specific files (including directories) from the newly created BE. exclude_list_file contains a list of files and directories, one per line. If a line item is a file, only that file is excluded; if a directory, that direc- tory and all files beneath that directory, including subdirectories, are excluded. This option is not supported when the source BE is on a ZFS file system. After it finished unmounting everything from the alternative root, it seems to have spawned *another* lupi_bebasic process which has eaten up 62 minutes of CPU time so far. Evidentally it''s doing a lot of string comparisons (per truss): /1 at 1: <- libc:strcmp() = 0 /1 at 1: -> libc:strcmp(0x86fceec, 0xfefa1218) /1 at 1: <- libc:strcmp() = 0 /1 at 1: -> libc:strcmp(0x86fd534, 0xfefa1218) /1 at 1: <- libc:strcmp() = 0 /1 at 1: -> libc:strcmp(0x86fdccc, 0xfefa1218) /1 at 1: <- libc:strcmp() = 0 /1 at 1: -> libc:strcmp(0x86fdcfc, 0xfefa1218) /1 at 1: <- libc:strcmp() = 0 /1 at 1: -> libc:strcmp(0x86fec84, 0xfefa1218) /1 at 1: <- libc:strcmp() = 0 /1 at 1: -> libc:strcmp(0x86fecb4, 0xfefa1218) /1 at 1: <- libc:strcmp() = 0 The first one finished in a bit over an hour, hopefully this one''s about done too and there''s not any more stuff to do. Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2009-Aug-28  05:59 UTC
[zfs-discuss] live upgrade with lots of zfs filesystems
On Thu, 27 Aug 2009, Paul B. Henson wrote:> However, I went to create a new boot environment to install the patches > into, and so far that''s been running for about an hour and a half :(, > which was not expected or planned for.[...]> I don''t think I''m going to make my downtime window :(, and will probably > need to reschedule the patching. I never considered I might have to start > the patch process six hours before the window.Well, so far lucreate took 3.5 hours, lumount took 1.5 hours, applying the patches took all of 10 minutes, luumount took about 20 minutes, and luactivate has been running for about 45 minutes. I''m assuming it will probably take at least the 1.5 hours of the lumount (particularly considering it appears to be running a lumount process under the hood) if not the 3.5 hours of lucreate. Add in the 1-1.5 hours to reboot, and, well, so much for patches this maintenance window. The lupi_bebasic process seems to be the time killer here. Not sure what it''s doing, but it spent 75 minutes running strcmp. Pretty much nothing but strcmp. 75 CPU minutes running strcmp???? I took a look for the source but I guess that component''s not a part of opensolaris, or at least I couldn''t find it. Hopefully I can figure out how to make this perform a little more acceptably before our next maintenance window. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Casper.Dik at Sun.COM
2009-Aug-28  08:32 UTC
[zfs-discuss] live upgrade with lots of zfs filesystems
>Well, so far lucreate took 3.5 hours, lumount took 1.5 hours, applying the >patches took all of 10 minutes, luumount took about 20 minutes, and >luactivate has been running for about 45 minutes. I''m assuming it will >probably take at least the 1.5 hours of the lumount (particularly >considering it appears to be running a lumount process under the hood) if >not the 3.5 hours of lucreate. Add in the 1-1.5 hours to reboot, and, well, >so much for patches this maintenance window. > >The lupi_bebasic process seems to be the time killer here. Not sure what >it''s doing, but it spent 75 minutes running strcmp. Pretty much nothing but >strcmp. 75 CPU minutes running strcmp???? I took a look for the source but >I guess that component''s not a part of opensolaris, or at least I couldn''t >find it. > >Hopefully I can figure out how to make this perform a little more >acceptably before our next maintenance window.Do you have a lot of files in /etc/mnttab, including nfs filesystems mounted from "server1,server2:/path"? And you''re using lucreate for a ZFS root? It should be "quick"; we are changing a number of things in Solaris 10 update 8 and we hope it will be faster/ Casper
On Thu, Aug 27, 2009 at 10:59:16PM -0700, Paul B. Henson wrote:> On Thu, 27 Aug 2009, Paul B. Henson wrote: > > > However, I went to create a new boot environment to install the patches > > into, and so far that''s been running for about an hour and a half :(, > > which was not expected or planned for. > [...] > > I don''t think I''m going to make my downtime window :(, and will probably > > need to reschedule the patching. I never considered I might have to start > > the patch process six hours before the window. > > Well, so far lucreate took 3.5 hours, lumount took 1.5 hours, applying the > patches took all of 10 minutes, luumount took about 20 minutes, and > luactivate has been running for about 45 minutes. I''m assuming it willHave a look at http://iws.cs.uni-magdeburg.de/~elkner/luc/lu-5.10.patch or http://iws.cs.uni-magdeburg.de/~elkner/luc/lu-5.11.patch ... So first install most recent LU patches and than one of the above. Since still on vacation (for ~8 weeks), haven''t checked, whether there are new LU patches out there and the patches still match (usually they do). If not, adjusting the files manually shouldn''t be a problem ;-) There are also versions for pre svn_b107 and pre 121430-36,121431-37: see http://iws.cs.uni-magdeburg.de/~elkner/ More info: http://iws.cs.uni-magdeburg.de/~elkner/luc/lutrouble.html#luslow Have fun, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768
Paul B. Henson
2009-Aug-29  04:56 UTC
[zfs-discuss] live upgrade with lots of zfs filesystems
On Fri, 28 Aug 2009 Casper.Dik at Sun.COM wrote:> >luactivate has been running for about 45 minutes. I''m assuming it will > >probably take at least the 1.5 hours of the lumount (particularly > >considering it appears to be running a lumount process under the hood) if > >not the 3.5 hours of lucreate.Eeeek, the luactivate command ended up taking about *7 hours* to complete. And I''m not sure it was even successful, output excerpts at the end of this message.> Do you have a lot of files in /etc/mnttab, including nfs filesystems > mounted from "server1,server2:/path"?There''s only one nfs filesystem in vfstab which is always mounted, user home directories are automounted and would be in mnttab if accessed, but during the lu process no users were on the box. On the other hand, there are a *lot* of zfs filesytems in mnttab: # grep zfs /etc/mnttab | wc -l 8145> And you''re using lucreate for a ZFS root? It should be "quick"; we are > changing a number of things in Solaris 10 update 8 and we hope it will be > faster/lucreate on a system with *only* an os root pool is blazing (the magic of clones). The problem occurs when my data pool (with 6k odd filesystems) is also there. The live upgrade process is analyzing all 6k of those filesystems, mounting them all in the alternate root, unmounting them all, and who knows what else. This is totally wasted effort, those filesystems have nothing to do with the OS or patching, and I''m really hoping that they can just be completely ignored. So, after 7 hours, here is the last bit of output from luactivate. Other than taking forever and a day, all of the output up to this point seemed normal. The BE s10u6 is neither the currently active BE nor the one being made active, but these errors have me concerned something _bad_ might happen if I reboot :(. Any thoughts? Modifying boot archive service Propagating findroot GRUB for menu conversion. ERROR: Read-only file system: cannot create mount point </.alt.s10u6/export/group/ceis> ERROR: failed to create mount point </.alt.s10u6/export/group/ceis> for file system <export/group/ceis> ERROR: unmounting partially mounted boot environment file systems ERROR: No such file or directory: error unmounting <ospool/ROOT/s10u6> ERROR: umount: warning: ospool/ROOT/s10u6 not in mnttab umount: ospool/ROOT/s10u6 no such file or directory ERROR: cannot unmount <ospool/ROOT/s10u6> ERROR: cannot mount boot environment by name <s10u6> ERROR: Failed to mount BE <s10u6>. ERROR: Failed to mount BE <s10u6>. Cannot propagate file </etc/lu/installgrub.findroot> to BE File propagation was incomplete ERROR: Failed to propagate installgrub ERROR: Could not propagate GRUB that supports the findroot command. Activation of boot environment <patch-20090817> successful. According to lustatus everything is good, but <shiver>... These boxes have only been in full production about a month, it would not be good for them to die during the first scheduled patches. # lustatus Boot Environment Is Active Active Can Copy Name Complete Now On Reboot Delete Status -------------------------- -------- ------ --------- ------ ---------- s10u6 yes no no yes - s10u6-20090413 yes yes no no - patch-20090817 yes no yes no - Tuanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
Paul B. Henson
2009-Aug-29  05:16 UTC
[zfs-discuss] live upgrade with lots of zfs filesystems
On Fri, 28 Aug 2009, Jens Elkner wrote:> More info: > http://iws.cs.uni-magdeburg.de/~elkner/luc/lutrouble.html#luslow******sweet******!!!!!! This is *exactly* the functionality I was looking for. Thanks much!!!! Any Sun people have any idea if Sun has any similar functionality planned for live upgrade? Live upgrade without this capability is basically useless on a system with lots of zfs filesystems. Jens, thanks again, this is perfect. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768