thr3ads.net - zfs discuss - [zfs-discuss] ZFS Honesty after a power failure [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Dennis Clarke

2009-Mar-24 17:37 UTC

[zfs-discuss] ZFS Honesty after a power failure

I''m happy to see that someone else brought up this topic. I had a nasty
long power failure last night that drained the APC/UPS batteries dry.[1]
:-(

I changed the subject line somewhat because I feel that the issue is one
of honesty as opposed to reliability.

I *feel* that ZFS is reliable out past six nines ( rho=0.999999 ) flawless
for two reasons; I have never seen it fail me and I have pounded it with
some fairly offensive abuse under terrible conditions[2], and secondly
because everyone in the computer industry is trying to
steal^H^H^H^H^Himplement it into their OS of choice. There must be a
reason for that.

However, I have repeatedly run into problems when I need to boot after a
power failure. I see vdevs being marked as FAULTED regardless if there are
actually any hard errors reported by the on disk SMART Firmware. I am able
to remove these FAULTed devices temporarily and then re-insert the same
disk again and then run fine for months. Until the next long power
failure.

This is where "honestly" becomes a question because I have to question
the
severity of the FAULT when I know from past experience that the disk(s) in
question can be removed and then re-inserted and life is fine for months.
Were harddisk manufacturers involved in this error message logic? :-P

A power failure, a really nice long one, happened last night and again
when I boot up I see nasty error messages.

Here is *precisely* what I saw last night :

{3} ok boot -s
Resetting ...


Sun Fire 480R, No Keyboard
Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.22.34, 16384 MB memory installed, Serial #53264354.
Ethernet address 0:3:ba:2c:bf:e2, Host ID: 832cbfe2.

Rebooting with command: boot -s
Boot device: /pci at 9,600000/SUNW,qlc at 2/fp at 0,0/disk at
w21000004cfb6f0ff,0:a
File and args: -s
SunOS Release 5.10 Version Generic_138888-03 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Booting to milestone "milestone/single-user:default".
Hostname: jupiter
Requesting System Maintenance Mode
SINGLE USER MODE

Root password for system maintenance (control-d to bypass):
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode
Mar 24 01:28:04 su: ''su root'' succeeded for root on
/dev/console
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
#

 /***************************************************/
 /* the very first thing I check is zpool fibre0    */
 /***************************************************/

# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using ''zpool upgrade''.  Once this is
done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        fibre0       ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t16d0  ONLINE       0     0     0
            c5t0d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t1d0   ONLINE       0     0     0
            c2t17d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t2d0   ONLINE       0     0     0
            c2t18d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t20d0  ONLINE       0     0     0
            c5t4d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t21d0  ONLINE       0     0     0
            c5t6d0   ONLINE       0     0     0
        spares
          c2t22d0    AVAIL

errors: No known data errors

 *************************************************
 * everything looks fine, okay, thank you to ZFS *
 * ... and then I try to boot to full init 3                              
               *
 *************************************************

# exit
svc.startd: Returning to milestone all.
Reading ZFS config: done.
Mounting ZFS filesystems: (1/51)

jupiter console l(51/51)
root
Password:
Last login: Sat Mar  7 19:39:00 on console
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using ''zpool upgrade''.  Once this is
done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        fibre0       ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t16d0  ONLINE       0     0     0
            c5t0d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t1d0   ONLINE       0     0     0
            c2t17d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t2d0   ONLINE       0     0     0
            c2t18d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t20d0  ONLINE       0     0     0
            c5t4d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t21d0  ONLINE       0     0     0
            c5t6d0   ONLINE       0     0     0
        spares
          c2t22d0    AVAIL

errors: No known data errors


 * everything STILL looks fine, and only seconds have passed.
 * Then .. I get bombarded with SEVERITY: Major faults

#
SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Mar 24 01:29:00 GMT 2009
PLATFORM: SUNW,Sun-Fire-480R, CSN: -, HOSTNAME: jupiter
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 3780a2dd-7381-c053-e186-8112b463c2b7
DESC: The number of I/O errors associated with a ZFS device exceeded
             acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD
for more information.
AUTO-RESPONSE: The device has been offlined and marked as faulted.  An
attempt
             will be made to activate a hot spare if available.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run ''zpool status -x'' and replace the bad device.
SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Mar 24 01:29:00 GMT 2009
PLATFORM: SUNW,Sun-Fire-480R, CSN: -, HOSTNAME: jupiter
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 146dad1d-f195-c2d6-c630-c1adcd58b288
DESC: The number of I/O errors associated with a ZFS device exceeded
             acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD
for more information.
AUTO-RESPONSE: The device has been offlined and marked as faulted.  An
attempt
             will be made to activate a hot spare if available.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run ''zpool status -x'' and replace the bad device.


 ********************************************************

  I know that I have been here before after a power failure
  with similar messages. They were not entirely honest about
  the SEVERITY of the device faults.
  The faults are certainly not "Major faults"

 *********************************************************

# zpool status fibre0
  pool: fibre0
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in
a
        degraded state.
action: Replace the faulted device, or use ''zpool clear'' to
mark the device
        repaired.
 scrub: resilver in progress for 0h0m, 0.02% done, 21h7m to go
config:

        NAME           STATE     READ WRITE CKSUM
        fibre0         DEGRADED     0     0     0
          mirror       ONLINE       0     0     0
            c2t16d0    ONLINE       0     0     0
            c5t0d0     ONLINE       0     0     0
          mirror       DEGRADED     0     0     0
            c5t1d0     ONLINE       0     0     0
            spare      DEGRADED     0     0     0
              c2t17d0  FAULTED      0     0     0  too many errors
              c2t22d0  ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c5t2d0     ONLINE       0     0     0
            c2t18d0    ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c2t20d0    ONLINE       0     0     0
            c5t4d0     ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c2t21d0    ONLINE       0     0     0
            c5t6d0     ONLINE       0     0     0
        spares
          c2t22d0      INUSE     currently in use

errors: No known data errors
# Mar 24 01:29:53 jupiter ntpdate[733]: no server suitable for
synchronization found

  ***********************************************************
  * at this point I go look at my cisco routers and check my AC
  * and get things booting, I also curse my new APC gear for not
  * signalings a power failure ... but that is another story.
  ***********************************************************

So can I *trust* what I am seeing?

Do I really believe that I have a SEVERE fault in a disk? Last time I did
this ( last month actually ) there were two disks faulted. Today there is
just one.

As usual I will NOT order a new replacement disk.

I just let that ZPool sort itself out. It will take an hour or so to sync
up that hot spare.

The machine in question is a production Solaris 10 server :

# uname -a
SunOS jupiter 5.10 Generic_138888-03 sun4u sparc SUNW,Sun-Fire-480R # cat
/etc/release
                       Solaris 10 5/08 s10s_u5wos_10 SPARC
           Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                             Assembled 24 March 2008

The zpool in question looks like so :

# zpool list
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
fibre0   680G   536G   144G    78%  DEGRADED  -
z0      40.2G   103K  40.2G     0%  ONLINE  -

# zpool status fibre0
  pool: fibre0
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in
a
        degraded state.
action: Replace the faulted device, or use ''zpool clear'' to
mark the device
        repaired.
 scrub: resilver completed after 1h35m with 0 errors on Tue Mar 24
03:04:49 2009
config:

        NAME           STATE     READ WRITE CKSUM
        fibre0         DEGRADED     0     0     0
          mirror       ONLINE       0     0     0
            c2t16d0    ONLINE       0     0     0
            c5t0d0     ONLINE       0     0     0
          mirror       DEGRADED     0     0     0
            c5t1d0     ONLINE       0     0     0
            spare      DEGRADED     0     0     0
              c2t17d0  FAULTED      0     0     0  too many errors
              c2t22d0  ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c5t2d0     ONLINE       0     0     0
            c2t18d0    ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c2t20d0    ONLINE       0     0     0
            c5t4d0     ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c2t21d0    ONLINE       0     0     0
            c5t6d0     ONLINE       0     0     0
        spares
          c2t22d0      INUSE     currently in use

errors: No known data errors

Is there *really* a severe fault in that disk ?

# luxadm -v display 21000018625d599d
  Displaying information for: 21000018625d599d
  Searching directory /dev/es for links to enclosures
DEVICE PROPERTIES for disk: 21000018625d599d
  Vendor:               HPQ
  Product ID:           BD1465822C
  Revision:             HP04
  Serial Num:           3KS36V5N000076218F5R
  Unformatted capacity: 140014.406 MBytes
  Write Cache:          Enabled
  Read Cache:           Enabled
    Minimum prefetch:   0x0
    Maximum prefetch:   0xffff
  Device Type:          Disk device
  Path(s):

  /dev/rdsk/c2t17d0s2
  /devices/pci at 8,600000/SUNW,qlc at 1/fp at 0,0/ssd at
w21000018625d599d,0:c,raw
    LUN path port WWN:          21000018625d599d
    Host controller port WWN:   210000e08b08f1a1
    Path status:                O.K.

What does the SMART Firmware say ?

# /root/bin/smartctl -a /dev/rdsk/c2t17d0s0
smartctl version 5.33 [sparc-sun-solaris2.8] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: HPQ      BD1465822C       Version: HP04
Serial number: 3KS36V5N000076218F5R
Device type: disk
Transport protocol: IEEE 1394 (SBP-2)
Local Time is: Tue Mar 24 14:09:07 2009 GMT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     33 C
Drive Trip Temperature:        68 C
Vendor (Seagate) cache information
  Blocks sent to initiator = 615507364
  Blocks received from initiator = 3004562974
  Blocks read from cache and sent to initiator = 94569699
  Number of read and write commands whose size <= segment size = 185763910
Number of read and write commands whose size > segment size = 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes
  Total
               EEC          rereads/    errors   algorithm      processed
  uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9
bytes]  errors
read:    8952309        0         0   8952309    8952309        999.277
       0
write:         0        0         0         0         12       1328.105
       0
verify:   934290        0         0    934290     934290        146.816
       0

Non-medium error count:        1

Error Events logging not supported

SMART Self-test log
Num  Test              Status                 segment  LifeTime
LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    31
  - [-   -    -]


It is hard to see but the total uncorrected errors is zero.


 ***********************************************
 * So let''s just correct the "SEVERE" fault.
 ***********************************************

# zpool detach fibre0 c2t17d0
# zpool detach fibre0 c2t22d0
# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using ''zpool upgrade''.  Once this is
done, the
        pool will no longer be accessible on older software versions.
 scrub: resilver completed after 1h35m with 0 errors on Tue Mar 24
03:04:49 2009
config:

        NAME         STATE     READ WRITE CKSUM
        fibre0       ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t16d0  ONLINE       0     0     0
            c5t0d0   ONLINE       0     0     0
          c5t1d0     ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t2d0   ONLINE       0     0     0
            c2t18d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t20d0  ONLINE       0     0     0
            c5t4d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t21d0  ONLINE       0     0     0
            c5t6d0   ONLINE       0     0     0

errors: No known data errors

# zpool attach fibre0 c5t1d0 c2t17d0
# zpool add fibre0 spare c2t22d0
# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h2m, 2.86% done, 1h18m to go
config:

        NAME         STATE     READ WRITE CKSUM
        fibre0       ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t16d0  ONLINE       0     0     0
            c5t0d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t1d0   ONLINE       0     0     0
            c2t17d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t2d0   ONLINE       0     0     0
            c2t18d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t20d0  ONLINE       0     0     0
            c5t4d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t21d0  ONLINE       0     0     0
            c5t6d0   ONLINE       0     0     0
        spares
          c2t22d0    AVAIL

errors: No known data errors
#

I have also learned that you can not trust that silver progress report
either. It will not take 1h18m to complete. If I wait 20 minutes I''ll
get
*nearly* the same estimate. The process must not be deterministic in
nature.

# zpool status
  pool: fibre0
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h39m, 34.24% done, 1h15m to go
config:

        NAME         STATE     READ WRITE CKSUM
        fibre0       ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t16d0  ONLINE       0     0     0
            c5t0d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t1d0   ONLINE       0     0     0
            c2t17d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t2d0   ONLINE       0     0     0
            c2t18d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t20d0  ONLINE       0     0     0
            c5t4d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t21d0  ONLINE       0     0     0
            c5t6d0   ONLINE       0     0     0
        spares
          c2t22d0    AVAIL

errors: No known data errors

  pool: z0
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        z0            ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c1t0d0s7  ONLINE       0     0     0
            c1t1d0s7  ONLINE       0     0     0

errors: No known data errors

# fmadm faulty -afg
#

I do TOTALLY trust that last line that says "No known data errors"
which
makes me wonder if the Severe FAULTs are for unknown data errors :-)


-- 
Dennis Clarke
sig du jour : "An appeaser is one who feeds a crocodile, hoping it will
eat him last.", Winston Churchill

[1] I really want to know where PowerChute for Solaris went to.

[2] I would create a ZPool of striped mirrors based on multiple USB keys
and on disks on IDE/SATA with or without compression and with
copies={1|2|3} and while running a ON compile I''d pull the USB keys out
and yank the power on the IDE/SATA or fibre disks. ZFS would not throw a
fatal error nor drop a bit of data. Performance suffered but data did not.

Bob Friesenhahn

2009-Mar-24 18:19 UTC

head link

[zfs-discuss] ZFS Honesty after a power failure

On Tue, 24 Mar 2009, Dennis Clarke wrote:>
> However, I have repeatedly run into problems when I need to boot after a
> power failure. I see vdevs being marked as FAULTED regardless if there are
> actually any hard errors reported by the on disk SMART Firmware. I am able
> to remove these FAULTed devices temporarily and then re-insert the same
> disk again and then run fine for months. Until the next long power
> failure.
In spite of huge detail, you failed to describe to us the technology 
used to communicate with these disks.  The interface adaptors, 
switches, and wiring topology could make a difference.
> Is there *really* a severe fault in that disk ?
>
> # luxadm -v display 21000018625d599d
This sounds some some sort of fiber channel.
> Transport protocol: IEEE 1394 (SBP-2)
Interesting that it mentions the protocol used by FireWire.

If you are using fiber channel, the device names in the pool 
specification suggest that Solaris multipathing is not being used (I 
would expect something long like 
c4t600A0B800039C9B500000A9C47B4522Dd0).  If multipathing is not used, 
then you either have simplex connectivity, or two competing simplex 
paths to each device.  Multipathing is recommended if you have 
redundant paths available.

If the disk itself is not aware of its severe faults then that 
suggests that there is a transient problem with communicating with the 
disk.  The problem could be in a device driver, adaptor card, FC 
switch, or cable.  If the disk drive also lost power, perhaps the disk 
is unusually slow at spinning up.

It is easy to blame ZFS for problems.  On my system I was experiencing 
system crashes overnight while running ''zfs scrub'' via cron
job.  The
fiber channel card was locking up.  Eventually I learned that it was 
due to a bug in VirtualBox''s device driver.  If VirtualBox was not 
left running overnight, then the system would not crash.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Dennis Clarke

2009-Mar-24 18:49 UTC

head link

[zfs-discuss] ZFS Honesty after a power failure

> On Tue, 24 Mar 2009, Dennis Clarke wrote:
>>
>> However, I have repeatedly run into problems when I need to boot after
a
>> power failure. I see vdevs being marked as FAULTED regardless if there
>> are
>> actually any hard errors reported by the on disk SMART Firmware. I am
>> able
>> to remove these FAULTed devices temporarily and then re-insert the same
>> disk again and then run fine for months. Until the next long power
>> failure.
>
> In spite of huge detail, you failed to describe to us the technology
> used to communicate with these disks.  The interface adaptors,
> switches, and wiring topology could make a difference.
Nothing fancy. Dual QLogic ( Sun ) fibre cards directly connected to the
back of A5200''s. Simple really.
>> Is there *really* a severe fault in that disk ?
>>
>> # luxadm -v display 21000018625d599d
>
> This sounds some some sort of fiber channel.
>
>> Transport protocol: IEEE 1394 (SBP-2)
>
> Interesting that it mentions the protocol used by FireWire.
I have no idea where that is coming from.
> If you are using fiber channel, the device names in the pool
> specification suggest that Solaris multipathing is not being used (I
> would expect something long like
> c4t600A0B800039C9B500000A9C47B4522Dd0).  If multipathing is not used,
> then you either have simplex connectivity, or two competing simplex
> paths to each device.  Multipathing is recommended if you have
> redundant paths available.
Yes, I have another machine that has mpxio in place. However a power
failure also trips phantom faults.
> If the disk itself is not aware of its severe faults then that
> suggests that there is a transient problem with communicating with the
> disk.
You would think so eh?
But a transient problem that only occurs after a power failure?
> The problem could be in a device driver, adaptor card, FC
> switch, or cable.  If the disk drive also lost power, perhaps the disk
> is unusually slow at spinning up.
All disks were up at boot, you can see that when I ask for a zpool status
at boot time in single user mode. No errors and no faults.

The issue seems to be when fmadm starts up or perhaps some other service
that can thrown a fault. I''m not sure.
> It is easy to blame ZFS for problems.
It is easy to blame a power failure for problems as well as an nice shiney
new APC Smart-UPS XL 3000VA RM 3U unit with external extended run time
battery that doesn''t signal a power failure.

I never blame ZFS for anything.
> On my system I was experiencing
> system crashes overnight while running ''zfs scrub'' via
cron job.  The
> fiber channel card was locking up.  Eventually I learned that it was
> due to a bug in VirtualBox''s device driver.  If VirtualBox was not
> left running overnight, then the system would not crash.
VirtualBox ?

This is a Solaris 10 machine. Nothing fancy. OKay, sorry, nothing way out
in the field fancy like VirtualBox.

Dennis

Dennis Clarke

2009-Mar-24 19:10 UTC

head link

[zfs-discuss] ZFS Honesty after a power failure

> On Tue, 24 Mar 2009, Dennis Clarke wrote:
>>
>> You would think so eh?
>> But a transient problem that only occurs after a power failure?
>
> Transient problems are most common after a power failure or during
> initialization.
Well the issue here is that power was on for ten minutes before I tried
to do a boot from the ok pronpt.

Regardless, the point is that the ZPool shows no faults at boot time and
then shows phantom faults *after* I go to init 3.

That does seem odd.

Dennsi

Bob Friesenhahn

2009-Mar-24 19:11 UTC

head link

[zfs-discuss] ZFS Honesty after a power failure

On Tue, 24 Mar 2009, Dennis Clarke wrote:>
> You would think so eh?
> But a transient problem that only occurs after a power failure?
Transient problems are most common after a power failure or during 
initialization.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2009-Mar-24 19:24 UTC

head link

[zfs-discuss] ZFS Honesty after a power failure

On Tue, 24 Mar 2009, Dennis Clarke wrote:>
> Regardless, the point is that the ZPool shows no faults at boot time and
> then shows phantom faults *after* I go to init 3.
>
> That does seem odd.
Yes, it does.  I assume that you have already taken the obvious first 
steps and assured that your kernel and device drivers are 
appropriately patched so that you are not encountering something which 
is already fixed.

--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Richard Elling

2009-Mar-24 19:24 UTC

head link

[zfs-discuss] ZFS Honesty after a power failure

Dennis Clarke wrote:>> On Tue, 24 Mar 2009, Dennis Clarke wrote:
>>     
>>> However, I have repeatedly run into problems when I need to boot
after a
>>> power failure. I see vdevs being marked as FAULTED regardless if
there
>>> are
>>> actually any hard errors reported by the on disk SMART Firmware. I
am
>>> able
>>> to remove these FAULTed devices temporarily and then re-insert the
same
>>> disk again and then run fine for months. Until the next long power
>>> failure.
>>>       
>> In spite of huge detail, you failed to describe to us the technology
>> used to communicate with these disks.  The interface adaptors,
>> switches, and wiring topology could make a difference.
>>     
>
> Nothing fancy. Dual QLogic ( Sun ) fibre cards directly connected to the
> back of A5200''s. Simple really.
>   
Run away!  Run away!
Save yourself a ton of grief and replace the A5200.
>>> Is there *really* a severe fault in that disk ?
>>>
>>> # luxadm -v display 21000018625d599d
>>>       
>> This sounds some some sort of fiber channel.
>>
>>     
>>> Transport protocol: IEEE 1394 (SBP-2)
>>>       
>> Interesting that it mentions the protocol used by FireWire.
>>     
>
> I have no idea where that is coming from.
>
>   
>> If you are using fiber channel, the device names in the pool
>> specification suggest that Solaris multipathing is not being used (I
>> would expect something long like
>> c4t600A0B800039C9B500000A9C47B4522Dd0).  If multipathing is not used,
>> then you either have simplex connectivity, or two competing simplex
>> paths to each device.  Multipathing is recommended if you have
>> redundant paths available.
>>     
>
> Yes, I have another machine that has mpxio in place. However a power
> failure also trips phantom faults.
>
>   
>> If the disk itself is not aware of its severe faults then that
>> suggests that there is a transient problem with communicating with the
>> disk.
>>     
>
> You would think so eh?
> But a transient problem that only occurs after a power failure?
>
>   
>> The problem could be in a device driver, adaptor card, FC
>> switch, or cable.  If the disk drive also lost power, perhaps the disk
>> is unusually slow at spinning up.
>>     
>
> All disks were up at boot, you can see that when I ask for a zpool status
> at boot time in single user mode. No errors and no faults.
>
> The issue seems to be when fmadm starts up or perhaps some other service
> that can thrown a fault. I''m not sure.
>   
The following will help you diagnose where the error messages
are generated from.  I doubt it is a problem with the disk, per se, but
you will want to double check your disk firmware to make sure it is
up to date (I''ve got scars)
    fmadm faulty
    fmdump -eV

 -- richard

Nathan Kroenert

2009-Mar-24 23:13 UTC

head link

[zfs-discuss] ZFS Honesty after a power failure

Hey, Dennis -

I can''t help but wonder if the failure is a result of zfs itself
finding
some problems post restart...

Is there anything in your FMA logs?

   fmstat

for a summary and

   fmdump

for a summary of the related errors

eg:
drteeth:/tmp # fmdump
TIME                 UUID                                 SUNW-MSG-ID
Nov 03 13:57:29.4190 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 ZFS-8000-D3
Nov 03 13:57:29.9921 916ce3e2-0c5c-e335-d317-ba1e8a93742e ZFS-8000-D3
Nov 03 14:04:58.8973 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d ZFS-8000-CS
Mar 05 18:04:40.7116 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d FMD-8000-4M 
Repaired
Mar 05 18:04:40.7875 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d FMD-8000-6U 
Resolved
Mar 05 18:04:41.0052 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 FMD-8000-4M 
Repaired
Mar 05 18:04:41.0760 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 FMD-8000-6U 
Resolved

then for example,

   fmdump -vu e28210d7-b7aa-42e0-a3e8-9ba21332d1c7

and

   fmdump -Vvu e28210d7-b7aa-42e0-a3e8-9ba21332d1c7

will show more and more information about the error. Note that some of 
it might seem like rubbish. The important bits should be obvious though 
- things like the SUNW error message is (like ZFS-8000-D3), which can be 
pumped into

   sun.com/msg

to see what exactly it''s going on about.

Note also that there should also be something interesting in the 
/var/adm/messages log to match and ''faulted'' devices.

You might also find an

   fmdump -e

and

   fmdump -eV

to be interesting - This is the *error* log as opposed to the *fault* 
log. (Every ''thing that goes wrong'' is an error, only those
that are
diagnosed are considered a fault.)

Note that in all of these fm[dump|stat] commands, you are really only 
looking at the two sets of data. The errors - that is the telemetry 
incoming to FMA - and the faults. If you include a -e, you view the 
errors, otherwise, you are looking at the faults.

By the way - sun.com/msg has a great PDF on it about the predictive self 
healing technologies in Solaris 10 and will offer more interesting 
information.

Would be interesting to see *why* ZFS / FMA is feeling the need to fault 
your devices.

I was interested to see on one of my boxes that I have actually had a 
*lot* of errors, which I''m now going to have to investigate... Looks 
like I have a dud rocket in my system... :)

Oh - And I saw this:

Nov 03 14:04:31.2783 ereport.fs.zfs.checksum

Score one more for ZFS! This box has a measly 300GB mirrored, and I have 
already seen dud data. (heh... It''s also got non-ecc memory... ;)

Cheers!

Nathan.

Dennis Clarke wrote:>> On Tue, 24 Mar 2009, Dennis Clarke wrote:
>>> You would think so eh?
>>> But a transient problem that only occurs after a power failure?
>> Transient problems are most common after a power failure or during
>> initialization.
> 
> Well the issue here is that power was on for ten minutes before I tried
> to do a boot from the ok pronpt.
> 
> Regardless, the point is that the ZPool shows no faults at boot time and
> then shows phantom faults *after* I go to init 3.
> 
> That does seem odd.
> 
> Dennsi
> 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 
//////////////////////////////////////////////////////////////////
// Nathan Kroenert              nathan.kroenert at sun.com         //
// Systems Engineer             Phone:  +61 3 9869-6255         //
// Sun Microsystems             Fax:    +61 3 9869-6288         //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456            //
// Melbourne 3004   Victoria    Australia                       //
//////////////////////////////////////////////////////////////////

Dennis Clarke

2009-Mar-25 02:27 UTC

head link

[zfs-discuss] ZFS Honesty after a power failure

> Hey, Dennis -
>
> I can''t help but wonder if the failure is a result of zfs itself
finding
> some problems post restart...
Yes, yes, this is what I am feeling also, but I need to find the data also
and then I can sleep at night.  I am certain that ZFS does not just toss
out faults on a whim because there must be a deterministic, logical and
code based reason for those faults that occur *after* I go to init 3.
> Is there anything in your FMA logs?
Oh God yes,  brace yourself :-)

http://www.blastwave.org/dclarke/zfs/fmstat.txt

[ I edit the whitespace here for clarity ]
# fmstat
module      ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-diagnosis   0   0  0.0      2.7   0   0   3     0   4.2K   1.1K
cpumem-retire      0   0  0.0      0.2   0   0   0     0      0      0
disk-transport     0   0  0.0     45.7   0   0   0     0    40b      0
eft                0   0  0.0      0.7   0   0   0     0   1.2M      0
fabric-xlate       0   0  0.0      0.7   0   0   0     0      0      0
fmd-self-diagnosis 3   0  0.0      0.2   0   0   0     0      0      0
io-retire          0   0  0.0      0.2   0   0   0     0      0      0
snmp-trapgen       2   0  0.0      1.7   0   0   0     0    32b      0
sysevent-transport 0   0  0.0     75.4   0   0   0     0      0      0
syslog-msgs        2   0  0.0      1.4   0   0   0     0      0      0
zfs-diagnosis    296 252  2.0 236719.7  98   0   1     2   176b   144b
zfs-retire         4   0  0.0     27.4   0   0   0     0      0      0

 zfs-diagnosis svc_t=236719.7 ?
> for a summary and
>
>    fmdump
>
> for a summary of the related errors
http://www.blastwave.org/dclarke/zfs/fmdump.txt

# fmdump
TIME                 UUID                                 SUNW-MSG-ID
Dec 05 21:31:46.1069 aa3bfcfa-3261-cde4-d381-dae8abf296de ZFS-8000-D3
Mar 07 08:46:43.6238 4c8b199b-add1-c3fe-c8d6-9deeff91d9de ZFS-8000-FD
Mar 07 19:37:27.9819 b4824ce2-8f42-4392-c7bc-ab2e9d14b3b7 ZFS-8000-FD
Mar 07 19:37:29.8712 af726218-f1dc-6447-f581-cc6bb1411aa4 ZFS-8000-FD
Mar 07 19:37:30.2302 58c9e01f-8a80-61b0-ffea-ded63a9b076d ZFS-8000-FD
Mar 07 19:37:31.6410 3b0bfd9d-fc39-e7c2-c8bd-879cad9e5149 ZFS-8000-FD
Mar 10 19:37:08.8289 aa3bfcfa-3261-cde4-d381-dae8abf296de FMD-8000-4M
Repaired
Mar 23 23:47:36.9701 2b1aa4ae-60e4-c8ef-8eec-d92a18193e7a ZFS-8000-FD
Mar 24 01:29:00.1981 3780a2dd-7381-c053-e186-8112b463c2b7 ZFS-8000-FD
Mar 24 01:29:02.1649 146dad1d-f195-c2d6-c630-c1adcd58b288 ZFS-8000-FD

# fmdump -vu 3780a2dd-7381-c053-e186-8112b463c2b7
TIME                 UUID                                 SUNW-MSG-ID
Mar 24 01:29:00.1981 3780a2dd-7381-c053-e186-8112b463c2b7 ZFS-8000-FD
  100%  fault.fs.zfs.vdev.io

        Problem in: zfs://pool=fibre0/vdev=444604062b426970
           Affects: zfs://pool=fibre0/vdev=444604062b426970
               FRU: -
          Location: -

# fmdump -vu 146dad1d-f195-c2d6-c630-c1adcd58b288
TIME                 UUID                                 SUNW-MSG-ID
Mar 24 01:29:02.1649 146dad1d-f195-c2d6-c630-c1adcd58b288 ZFS-8000-FD
  100%  fault.fs.zfs.vdev.io

        Problem in: zfs://pool=fibre0/vdev=23e4d7426f941f52
           Affects: zfs://pool=fibre0/vdev=23e4d7426f941f52
               FRU: -
          Location: -
> will show more and more information about the error. Note that some of
> it might seem like rubbish. The important bits should be obvious though
> - things like the SUNW error message is (like ZFS-8000-D3), which can be
> pumped into
>
>    sun.com/msg
like so :

http://www.sun.com/msg/ZFS-8000-FD

or see http://www.blastwave.org/dclarke/zfs/ZFS-8000-FD.txt

        Article for Message ID:   ZFS-8000-FD

      Too many I/O errors on ZFS device

      Type

         Fault

      Severity

         Major

      Description

         The number of I/O errors associated with a ZFS device exceeded
         acceptable levels.

      Automated Response

         The device has been offlined and marked as faulted.
         An attempt will be made to activate a hot spare if available.

      Impact

         The fault tolerance of the pool may be affected.


Yep, I agree, that is what I saw.
> Note also that there should also be something interesting in the
> /var/adm/messages log to match and ''faulted'' devices.
>
> You might also find an
>
>    fmdump -e
spooky long list of events :

TIME                 CLASS
Mar 23 23:47:28.5586 ereport.fs.zfs.io
Mar 23 23:47:28.5594 ereport.fs.zfs.io
Mar 23 23:47:28.5588 ereport.fs.zfs.io
Mar 23 23:47:28.5592 ereport.fs.zfs.io
Mar 23 23:47:28.5593 ereport.fs.zfs.io
.
.
.
Mar 23 23:47:28.5622 ereport.fs.zfs.io
Mar 23 23:47:28.5560 ereport.fs.zfs.io
Mar 23 23:47:28.5658 ereport.fs.zfs.io
Mar 23 23:48:41.5957 ereport.fs.zfs.io


   http://www.blastwave.org/dclarke/zfs/fmdump_e.txt

ouch, that is a nasty long list all in a few seconds.
> and
>
>    fmdump -eV
a very detailed verbose long list with such entries as

Mar 23 2009 23:48:41.595757900 ereport.fs.zfs.io
nvlist version: 0
	class = ereport.fs.zfs.io
	ena = 0x79c098255f400c01
	detector = (embedded nvlist)
	nvlist version: 0
		version = 0x0
		scheme = zfs
		pool = 0xe3bb9417bc13c68d
		vdev = 0x444604062b426970
	(end detector)

	pool = fibre0
	pool_guid = 0xe3bb9417bc13c68d
	pool_context = 0
	pool_failmode = wait
	vdev_guid = 0x444604062b426970
	vdev_type = disk
	vdev_path = /dev/dsk/c2t17d0s0
	vdev_devid = id1,ssd at n20000018625d599d/a
	parent_guid = 0x2cc7f46f722cfd61
	parent_type = mirror
	zio_err = 6
	zio_offset = 0xf97ebf400
	zio_size = 0x1400
	__ttl = 0x1
	__tod = 0x49c81fd9 0x23828b4c
> to be interesting - This is the *error* log as opposed to the *fault*
> log. (Every ''thing that goes wrong'' is an error, only
those that are
> diagnosed are considered a fault.)
I seem to have many things wrong. Many things. :-(
> Note that in all of these fm[dump|stat] commands, you are really only
> looking at the two sets of data. The errors - that is the telemetry
> incoming to FMA - and the faults. If you include a -e, you view the
> errors, otherwise, you are looking at the faults.
>
> By the way - sun.com/msg has a great PDF on it about the predictive self
> healing technologies in Solaris 10 and will offer more interesting
> information.
I think I have seen it before, it is very "marketing" focused.
>
> Would be interesting to see *why* ZFS / FMA is feeling the need to fault
> your devices.
It is a pile of information/data that still makes me wonder why also,
because I can easily detach and reattach those disks and be back in
business for months with no issue.
> I was interested to see on one of my boxes that I have actually had a
> *lot* of errors, which I''m now going to have to investigate...
Looks
> like I have a dud rocket in my system... :)
I probably have a dud in there also, but ZFS refuses to FAULT it while
under normal day to day load. That is what is very odd.
>
> Oh - And I saw this:
>
> Nov 03 14:04:31.2783 ereport.fs.zfs.checksum
>
> Score one more for ZFS! This box has a measly 300GB mirrored, and I have
> already seen dud data. (heh... It''s also got non-ecc memory... ;)
I don''t think I have to worry about ECC memory on Sun hardware but I am
getting concerned about those disks that I have. I just am waiting for a
FAULT that will not go away so easily.

Thanks for the reply and the helpful pointers. Do people on maillists say
"thank you" anymore? Well, I just did.

Dennis

zfs discuss - Mar 2009 - ZFS Honesty after a power failure

[zfs-discuss] ZFS Honesty after a power failure

[zfs-discuss] ZFS Honesty after a power failure

[zfs-discuss] ZFS Honesty after a power failure

[zfs-discuss] ZFS Honesty after a power failure

[zfs-discuss] ZFS Honesty after a power failure

[zfs-discuss] ZFS Honesty after a power failure

[zfs-discuss] ZFS Honesty after a power failure

[zfs-discuss] ZFS Honesty after a power failure

[zfs-discuss] ZFS Honesty after a power failure