thr3ads.net - zfs discuss - [zfs-discuss] zfs/sol10u8 less stable than in sol10u5? [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Carsten Aulbert

2010-Feb-04 15:30 UTC

[zfs-discuss] zfs/sol10u8 less stable than in sol10u5?

Hi all,

it might not be a ZFS issue (and thus on the wrong list), but maybe
there''s
someone here who might be able to give us a good hint:

We are operating 13 x4500 and started to play with non-Sun blessed SSDs in 
there. As we were running Solaris 10u5 before and wanted to use them as log 
devices we upgraded to the latest and greatest 10u8 and changed the zpool 
layout[1]. However, on the first machine we found many, many problems with 
various disks "failing" in different vdevs (I wrote about this in
December on
this list IIRC).

After going through this with Sun they gave us hints but mostly blamed (maybe 
rightfully the Intel X25e in there), we considered the 2.5" to 2.5"
converter
to be at fault as well. Thus we did the next test by placing the SSD into the 
tray without a conversion unit, but that box (a different one) failed with the 
same problems.

Now, we "learned" from this experience and did the same to another box
but
without the SSD, i.e. jumpstarted the box and installed 10u8, redid the zpool 
and started to fill data in. In today''s scrub suddenly this happened:

s09:~# zpool status                                                   
  pool: atlashome                                                     
 state: DEGRADED                                                      
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors  
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P                                  
 scrub: resilver in progress for 0h9m, 3.89% done, 4h2m to go               
config:                                                                     

        NAME          STATE     READ WRITE CKSUM
        atlashome     DEGRADED     0     0     0
          raidz1      ONLINE       0     0     0
            c0t0d0    ONLINE       0     0     0
            c1t0d0    ONLINE       0     0     0
            c4t0d0    ONLINE       0     0     0
            c6t0d0    ONLINE       0     0     0
            c7t0d0    ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c0t1d0    ONLINE       0     0     0
            c1t1d0    ONLINE       0     0     0
            c4t1d0    ONLINE       0     0     0
            c5t1d0    ONLINE       0     0     0
            c6t1d0    ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c7t1d0    ONLINE       0     0     1
            c0t2d0    ONLINE       0     0     0
            c1t2d0    ONLINE       0     0     2
            c4t2d0    ONLINE       0     0     0
            c5t2d0    ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c6t2d0    ONLINE       0     0     0
            c7t2d0    ONLINE       0     0     0
            c0t3d0    ONLINE       0     0     0
            c1t3d0    ONLINE       0     0     0
            c4t3d0    ONLINE       0     0     0
          raidz1      DEGRADED     0     0     0
            c5t3d0    ONLINE       0     0     0
            c6t3d0    ONLINE       0     0     0
            c7t3d0    ONLINE       0     0     0
            c1t4d0    ONLINE       0     0     1
            spare     DEGRADED     0     0     0
              c4t4d0  DEGRADED     5     0    11  too many errors
              c0t4d0  ONLINE       0     0     0  5.38G resilvered
          raidz1      ONLINE       0     0     0
            c5t4d0    ONLINE       0     0     0
            c6t4d0    ONLINE       0     0     0
            c7t4d0    ONLINE       0     0     0
            c0t5d0    ONLINE       0     0     0
            c1t5d0    ONLINE       0     0     0
          raidz1      ONLINE       0     0     0
            c4t5d0    ONLINE       0     0     0
            c5t5d0    ONLINE       0     0     0
            c6t5d0    ONLINE       0     0     0
            c7t5d0    ONLINE       0     0     0
            c0t6d0    ONLINE       0     0     1
          raidz1      ONLINE       0     0     0
            c1t6d0    ONLINE       0     0     0
            c4t6d0    ONLINE       0     0     0
            c5t6d0    ONLINE       0     0     0
            c6t6d0    ONLINE       0     0     0
            c7t6d0    ONLINE       0     0     1
          raidz1      ONLINE       0     0     0
            c0t7d0    ONLINE       0     0     0
            c1t7d0    ONLINE       0     0     0
            c4t7d0    ONLINE       0     0     0
            c5t7d0    ONLINE       0     0     0
            c6t7d0    ONLINE       0     0     0
        spares
          c0t4d0      INUSE     currently in use
          c7t7d0      AVAIL


Also similar to the other hosts were the much, much higher Soft/Hard error 
count in iostat:

s09:~# iostat -En|grep Soft
c2t0d0           Soft Errors: 1 Hard Errors: 2 Transport Errors: 0 
c3t0d0           Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 
c5t0d0           Soft Errors: 2805 Hard Errors: 0 Transport Errors: 0 
c6t0d0           Soft Errors: 4003 Hard Errors: 1 Transport Errors: 0 
c4t0d0           Soft Errors: 4003 Hard Errors: 2 Transport Errors: 0 
c1t0d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c6t1d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c1t1d0           Soft Errors: 4002 Hard Errors: 1 Transport Errors: 0 
c4t1d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c5t1d0           Soft Errors: 4003 Hard Errors: 1 Transport Errors: 0 
c1t2d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c0t0d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c5t2d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c4t2d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0 
c1t3d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0 
c6t2d0           Soft Errors: 4002 Hard Errors: 1 Transport Errors: 0
c0t1d0           Soft Errors: 4002 Hard Errors: 2 Transport Errors: 0
c4t3d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0
c5t3d0           Soft Errors: 4001 Hard Errors: 1 Transport Errors: 0
c6t3d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0
c1t4d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c0t2d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c4t4d0           Soft Errors: 4004 Hard Errors: 6 Transport Errors: 0
c5t4d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c5t5d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c6t4d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c4t5d0           Soft Errors: 4003 Hard Errors: 2 Transport Errors: 0
c1t5d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c0t3d0           Soft Errors: 4001 Hard Errors: 1 Transport Errors: 0
c5t6d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c1t6d0           Soft Errors: 4003 Hard Errors: 1 Transport Errors: 0
c4t6d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c0t4d0           Soft Errors: 4001 Hard Errors: 0 Transport Errors: 0
c6t5d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c5t7d0           Soft Errors: 4000 Hard Errors: 1 Transport Errors: 0
c4t7d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0
c0t5d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c6t6d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c1t7d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0
c0t6d0           Soft Errors: 4001 Hard Errors: 1 Transport Errors: 0
c6t7d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0
c0t7d0           Soft Errors: 4000 Hard Errors: 0 Transport Errors: 0
c7t0d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c7t1d0           Soft Errors: 4003 Hard Errors: 2 Transport Errors: 0
c7t2d0           Soft Errors: 4003 Hard Errors: 1 Transport Errors: 0
c7t3d0           Soft Errors: 4001 Hard Errors: 1 Transport Errors: 0
c7t4d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c7t5d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c7t6d0           Soft Errors: 4002 Hard Errors: 0 Transport Errors: 0
c7t7d0           Soft Errors: 3997 Hard Errors: 0 Transport Errors: 0

(after an uptime of only a couple of days):

s09:~# uptime
  4:27pm  up 2 day(s), 21:31,  1 user,  load average: 0.17, 0.34, 1.45
s09:~# uname -a
SunOS s09 5.10 Generic_142901-03 i86pc i386 i86pc

We checked these numbers before the upgrade and we had no hard errors and only 
an order of magnitude less soft errors after 10s of days of uptime.

Is there anyone aware of some regression when going to 10u8? Might it be ZFS 
related or can the hardware of 3 x4500 rot away after an upgrade within days 
when the environmental state is not changed at all?

Thanks a lot in advance for any hint

Carsten

[1] before we used 3 vdevs with 15, 15 and 16 disks inside, now we are using 9 
vdevs with 5 disks each (plus 2 hot spares)

sean walmsley

2010-Feb-13 18:11 UTC

head link

[zfs-discuss] zfs/sol10u8 less stable than in sol10u5?

We recently patched our X4500 from Sol10 U6 to Sol10 U8 and have not noticed
anything like what you''re seeing. We do not have any SSD devices
installed.
-- 
This message posted from opensolaris.org

zfs discuss - Feb 2010 - zfs/sol10u8 less stable than in sol10u5?

[zfs-discuss] zfs/sol10u8 less stable than in sol10u5?

[zfs-discuss] zfs/sol10u8 less stable than in sol10u5?