Elizabeth Schwartz
2009-Feb-18 03:56 UTC
[zfs-discuss] Zpool scrub in cron hangs u3/u4 server, stumps tech support.
I''ve got a server that freezes when I run a zpool scrub from cron. Zpool scrub runs fine from the command line, no errors. The freeze happens within 30 seconds of the zpool scrub happening. The one core dump I succeeded in taking showed an arccache eating up all the ram. The server''s running Solaris 10 u3, kernel patch 127727-11 but it''s been patched and seems to have some u4 features (particularly, the arc variables) The only bug report I could find shows a similar bug patched in 120011-14, a patch which I installed many months ago. Sun support threw up their hands and said to install Solaris 10 u6, which I''m not really happy about doing as a bug fix to a production server running a supported version of Sun OS. Once Upon a Time, Sun used to offer *patches* to paying customers for operating system bugs. I quote the latest ticket note in disgust: "I really don''t know what to tell you. S10u6 has many enhancements and improvments to zfs, but most can be gained though patchs with the exception of new features." I''m trying to escalate the ticket, but really, I''m angry. I''ve been a big champion of staying with Sun/Solaris over Linux and one of the reasons has been that traditionally Sun had really good tech support, and you could *get* patches if you needed them. If the answer is going to be "we don''t know what the bug is but maybe a later release will fix it - or not " that''s not very reassuring. Any thoughts - besides upgrading? Which we''ll do, but it''s a production server so I don''t want to rush it. -- Unix Systems Administrator Harvard Graduate School of Design
Bob Friesenhahn
2009-Feb-18 16:53 UTC
[zfs-discuss] Zpool scrub in cron hangs u3/u4 server, stumps tech support.
On Tue, 17 Feb 2009, Elizabeth Schwartz wrote:> > Sun support threw up their hands and said to install Solaris 10 u6, > which I''m not really happy about doing as a bug fix to a production > server running a supported version of Sun OS. Once Upon a Time, Sun > used to offer *patches* to paying customers for operating system bugs. > I quote the latest ticket note in disgust: "I really don''t know what > to tell you. S10u6 has many enhancements and improvments to zfs, but > most can be gained though patchs with the exception of new features."You should investigate the use of Live Upgrade to upgrade your server to Solaris 10U6 with absolute minimal down-time. Patching only goes so far, and the version you are using is quite old. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Elizabeth Schwartz
2009-Feb-18 16:55 UTC
[zfs-discuss] Zpool scrub in cron hangs u3/u4 server, stumps tech support.
It''s an old version but it''s a *supported* version and we have a five-figure support contract. That used to matter. I''ve never used Live Upgrade; I want to try it out but not on my production file server, and I want to know that this particular bug is fixed first, something more definite than "many improvements"
Bob Friesenhahn
2009-Feb-18 17:44 UTC
[zfs-discuss] Zpool scrub in cron hangs u3/u4 server, stumps tech support.
On Wed, 18 Feb 2009, Elizabeth Schwartz wrote:> It''s an old version but it''s a *supported* version and we have a > five-figure support contract. That used to matter.I can understand your frustration. ZFS in Solaris 10U3 was a bit rough around the edges. It is definitely improved in later releases.> I''ve never used Live Upgrade; I want to try it out but not on my > production file server, and I want to know that this particular bug is > fixed first, something more definite than "many improvements"As long as you have spare bootable partitions, Live Upgrade is exceedingly useful. It allows you to create a new boot environment with the newer Solaris installed, and with all of your local changes applied. You can double-check to make sure that everything is ready to go via a mount to the new boot evironment. Switching to the new boot environment is as simple as ''luactivate'' followed by a reboot. It is likely to work first time, but if it does not, you can reboot to your previous boot environment for minimal server down time. If you are using Grub, then each boot environment is listed in the Grub boot menu. With proper care, using Live Upgrade is safer (and faster) for production systems than applying large numbers of patches spanning many Solaris 10 generations. You can also use multiple boot environments to apply patches, in order to minimize risk and minimize down time. If you are able to install Solaris 10U6 with ZFS boot, then subsequent Live Upgrades should be far easier since boot evironments are directories in the root pool (''rpool'') rather than in dedicated partitions. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Blake
2009-Feb-18 19:14 UTC
[zfs-discuss] Zpool scrub in cron hangs u3/u4 server, stumps tech support.
Bob is correct to praise LiveUpgrade. It''s pretty much risk-free when used properly, provided you have some spare slices/disks. At the same time, I''d say that this is probably an appropriate time to escalate the bug with support - the answers you are getting aren''t satisfactory. I would also consider creating a user/role with zfs admin privileges only, and trying to run the scrub command from cron as this user - I had a similar problem with an old ZFS version which I worked around by issuing commands as a user other than root. On Wed, Feb 18, 2009 at 12:44 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> On Wed, 18 Feb 2009, Elizabeth Schwartz wrote: > >> It''s an old version but it''s a *supported* version and we have a >> five-figure support contract. That used to matter. > > I can understand your frustration. ZFS in Solaris 10U3 was a bit rough > around the edges. It is definitely improved in later releases. > >> I''ve never used Live Upgrade; I want to try it out but not on my >> production file server, and I want to know that this particular bug is >> fixed first, something more definite than "many improvements" > > As long as you have spare bootable partitions, Live Upgrade is exceedingly > useful. It allows you to create a new boot environment with the newer > Solaris installed, and with all of your local changes applied. You can > double-check to make sure that everything is ready to go via a mount to the > new boot evironment. Switching to the new boot environment is as simple as > ''luactivate'' followed by a reboot. It is likely to work first time, but if > it does not, you can reboot to your previous boot environment for minimal > server down time. If you are using Grub, then each boot environment is > listed in the Grub boot menu. > > With proper care, using Live Upgrade is safer (and faster) for production > systems than applying large numbers of patches spanning many Solaris 10 > generations. You can also use multiple boot environments to apply patches, > in order to minimize risk and minimize down time. > > If you are able to install Solaris 10U6 with ZFS boot, then subsequent Live > Upgrades should be far easier since boot evironments are directories in the > root pool (''rpool'') rather than in dedicated partitions. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >