I want to express my thanks. My gratitude. I am not easily impressed
by technology anymore and ZFS impressed me this morning.
Sometime late last night a primary server of mine had a critical
fault. One of the PCI cards in a V480 was the cause and for whatever
reasons this destroyed the DC-DC power convertors that powered the
primary internal disks. It also dropped the whole machine and 12
zones.
I feared the worst and made the call for service at about midnight
last night. A Sun service tech said he could be there in 2 hours
or so but he asked me to check this and check that. The people at
the datacenter were happy to tell me there was a wrench light on
but other than that, they knew nothing.
This machine, like all critical systems I have, uses mirrored disks
in ZPools with multiple links of fibre to arrays. I dreaded what
would happen when we tried to boot this box after all the dust was
blown out and hardware swapped.
Early this morning ... I watched the detailed diags run and finally
a nice clean ok prompt.
<*>
Hardware Power On
@(#)OBP 4.22.34 2007/07/23 13:01 Sun Fire 4XX
System is initializing with diag-switch? overrides.
Online: CPU0 CPU1 CPU2 CPU3*
Validating JTAG integrity...Done
.
.
.
CPU0: System POST Completed
Pass/Fail Status = 0000.0000.0000.0000
ESB Overall Status = ffff.ffff.ffff.ffff
<*>
POST Reset
.
.
.
{3} ok show-post-results
System POST Results
Component: Results
CPU/Memory: Passed
IO-Bridge8: Passed
IO-Bridge9: Passed
GPTwo Slots: Passed
Onboard FCAL: Passed
Onboard Net1: Passed
Onboard Net0: Passed
Onboard IDE: Passed
PCI Slots: Passed
BBC0: Passed
RIO: Passed
USB: Passed
RSC: Passed
POST Message: POST PASS
{3} ok boot -s
Eventually I saw my login prompt. There were no warnings about data
corruption. No data loss. No noise at all in fact. :-O
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
fibre0 680G 654G 25.8G 96% ONLINE -
z0 40.2G 103K 40.2G 0% ONLINE -
# zpool status fibre0
pool: fibre0
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using ''zpool upgrade''. Once this is
done, the
pool will no longer be accessible on older software versions.
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
fibre0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t16d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c2t17d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t2d0 ONLINE 0 0 0
c2t18d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t20d0 ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t21d0 ONLINE 0 0 0
c5t6d0 ONLINE 0 0 0
spares
c2t22d0 AVAIL
errors: No known data errors
#
Not one error. No message about resilver this or inode that.
Everything booted flawlessly and I was able to see all my zones :
# bin/lz
---------------------------------------------------------------------
NAME ID STATUS PATH HOSTNAME BRAND IP
---------------------------------------------------------------------
z_001 4 running /zone/z_001 pluto solaris8 excl
z_002 - installed /zone/z_002 ldap01 native shared
z_003 - installed /zone/z_003 openfor solaris9 shared
z_004 6 running /zone/z_004 gaspra native shared
z_005 5 running /zone/z_005 ibisprd native shared
z_006 7 running /zone/z_006 io native shared
z_007 1 running /zone/z_007 nis native shared
z_008 3 running /zone/z_008 callistoz native shared
z_009 2 running /zone/z_009 loginz native shared
z_010 - installed /zone/z_010 venus solaris8 shared
z_011 - installed /zone/z_011 adbs solaris9 shared
z_012 - installed /zone/z_012 auroraux native shared
z_013 8 running /zone/z_013 osiris native excl
z_014 - installed /zone/z_014 jira native shared
People love to complain. I see it all the time.
I downloaded this OS for free and I run it in production.
I have support and I am fine with paying for support contracts.
But someone somewhere needs to buy the ZFS guys some keg(s) of
whatever beer they want. Or maybe new Porsche Cayman S toys.
That would be gratitude as something more than just words.
Thank you.
--
Dennis Clarke
ps: the one funny thing is that I had to get a few things swapped
out and I guess that resets the system clock. It now reports :
# uptime
8:19pm up 3483 day(s), 19:07, 1 user, load average: 0.24, 0.21, 0.18
I don''t think that is accurate :-)