thr3ads.net - zfs discuss - [zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines [Aug 2008]

If this information is useful, please help other people find it:
Share via:

David Orman

2008-Aug-26 10:54 UTC

[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

Hi,

After upgrading to b95 of OSOL/Indiana, and doing a ZFS upgrade to the newer
revision, all arrays I have using ZFS mirroring are displaying errors. This
started happening immediately after ZFS upgrades. Here is an example:

ormandj at neutron.corenode.com:~$ zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h12m with 6 errors on Wed Aug 20 06:26:33
2008
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       DEGRADED     0     0    12
          mirror    DEGRADED     0     0    12
            c6d0s0  DEGRADED     0     0    24  too many errors
            c7d0s0  DEGRADED     0     0    24  too many errors

errors: No known data errors
ormandj at neutron.corenode.com:~$

ormandj at neutron.corenode.com:~$ zpool get all rpool
NAME   PROPERTY     VALUE                     SOURCE
rpool  size         137G                      -
rpool  used         21.0G                     -
rpool  available    116G                      -
rpool  capacity     15%                       -
rpool  altroot      -                         default
rpool  health       DEGRADED                  -
rpool  guid         10397084409580638341      -
rpool  version      11                        default
rpool  bootfs       rpool/ROOT/opensolaris-4  local
rpool  delegation   on                        default
rpool  autoreplace  off                       default
rpool  cachefile    -                         default
rpool  failmode     wait                      default
ormandj at neutron.corenode.com:~$

ormandj at neutron.corenode.com:~$ pfexec prtvtoc /dev/rdsk/c6d0s0
* /dev/rdsk/c6d0s0 partition map
*
* Dimensions:
*     512 bytes/sector
*      63 sectors/track
*     255 tracks/cylinder
*   16065 sectors/cylinder
*   18240 cylinders
*   18238 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      2    00    4209030 288784440 292993469
       1      3    01      16065   4192965   4209029
       2      5    01          0 292993470 292993469
       8      1    01          0     16065     16064
ormandj at neutron.corenode.com:~$ pfexec prtvtoc /dev/rdsk/c7d0s0
* /dev/rdsk/c7d0s0 partition map
*
* Dimensions:
*     512 bytes/sector
*      63 sectors/track
*     255 tracks/cylinder
*   16065 sectors/cylinder
*   18240 cylinders
*   18238 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      2    00    4209030 288784440 292993469
       1      3    01      16065   4192965   4209029
       2      5    01          0 292993470 292993469
       8      1    01          0     16065     16064
ormandj at neutron.corenode.com:~$

These are root pools, and this is the method in which they were created:

#1 - Partition first drive during installation.
#2 - Once OSOL was installed: prtvtoc /dev/rdsk/c6d0s0 | fmthard -s -
/dev/rdsk/c7d0s0
#3 - zpool attach rpool c6d0s0 c7d0s0
#4 - installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c6d0s0
#5 - installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c7d0s0

Anybody have any ideas? I''d blame failing hardware, but this started
happening on 3 different machines, immediately following the versioin
upgrade (it prompted me to upgrade my pools and I did.) I thought there
might be a problem, so I detached the secondary drives on all systems, and
performed the steps above listed again. Still having errors. Please let me
know what other data would be useful for you, and I''ll provide it,
I''m
interested in getting this resolved as soon as possible.

Cheers,
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080826/ca09ca8f/attachment.html>

Nils Goroll

2008-Aug-26 12:42 UTC

head link

[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

Hi David,

have you tried mounting and re-mounting all filesystems which are not 
being mounted automatically? See other posts to zfs-discuss.

Nils

Nils Goroll

2008-Aug-26 12:43 UTC

head link

[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

glitch:> have you tried mounting and re-mounting all filesystems which are not                                ^^^^^^^^^^^
			      unmounting

David Orman

2008-Aug-26 15:35 UTC

head link

[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

I''ve rebooted the system(s), which should accomplish this. I''m
not clear
which posts you are referring to, I just joined the list today. The ZFS pool
is being mounted automatically, and that is the only filesystem on my
system. I filed: http://defect.opensolaris.org/bz/show_bug.cgi?id=3079 (bug
3079) as well:

ormandj at neutron.corenode.com:~$ mount
/ on rpool/ROOT/opensolaris-4 read/write/setuid/devices/dev=2d90002 on Wed
Dec 31 18:00:00 1969
/devices on /devices read/write/setuid/devices/dev=4a00000 on Sat Aug 16
09:06:25 2008
/dev on /dev read/write/setuid/devices/dev=4a40000 on Sat Aug 16 09:06:25
2008
/system/contract on ctfs read/write/setuid/devices/dev=4ac0001 on Sat Aug 16
09:06:25 2008
/proc on proc read/write/setuid/devices/dev=4b00000 on Sat Aug 16 09:06:25
2008
/etc/mnttab on mnttab read/write/setuid/devices/dev=4b40001 on Sat Aug 16
09:06:25 2008
/etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=4b80001 on Sat
Aug 16 09:06:25 2008
/system/object on objfs read/write/setuid/devices/dev=4bc0001 on Sat Aug 16
09:06:25 2008
/etc/dfs/sharetab on sharefs read/write/setuid/devices/dev=4c00001 on Sat
Aug 16 09:06:25 2008
/lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1
read/write/setuid/devices/dev=2d90002 on Sat Aug 16 09:06:35 2008
/dev/fd on fd read/write/setuid/devices/dev=4d00001 on Sat Aug 16 09:06:36
2008
/tmp on swap read/write/setuid/devices/xattr/dev=4b80002 on Sat Aug 16
09:06:36 2008
/var/run on swap read/write/setuid/devices/xattr/dev=4b80003 on Sat Aug 16
09:06:36 2008
/opt on rpool/ROOT/opensolaris-4/opt
read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90004 on Sat Aug
16 09:06:36 2008
/export on rpool/export
read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90007 on Sat Aug
16 09:06:39 2008
/export/home on rpool/export/home
read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90008 on Sat Aug
16 09:06:39 2008
/rpool on rpool
read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90009 on Sat Aug
16 09:06:39 2008
/rpool/ROOT on rpool/ROOT
read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d9000a on Sat Aug
16 09:06:39 2008
ormandj at neutron.corenode.com:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
rpool/ROOT/opensolaris-4
                      120G  5.9G  114G   5% /
swap                  3.5G  304K  3.5G   1% /etc/svc/volatile
/usr/lib/libc/libc_hwcap1.so.1
                      120G  5.9G  114G   5% /lib/libc.so.1
swap                  3.5G   40K  3.5G   1% /tmp
swap                  3.5G   36K  3.5G   1% /var/run
rpool/ROOT/opensolaris-4/opt
                      115G  582M  114G   1% /opt
rpool/export          114G   19K  114G   1% /export
rpool/export/home     124G  9.3G  114G   8% /export/home
rpool                 114G   61K  114G   1% /rpool
rpool/ROOT            114G   18K  114G   1% /rpool/ROOT
ormandj at neutron.corenode.com:~$



On Tue, Aug 26, 2008 at 7:42 AM, Nils Goroll <slink at schokola.de> wrote:
> Hi David,
>
> have you tried mounting and re-mounting all filesystems which are not being
> mounted automatically? See other posts to zfs-discuss.
>
> Nils
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080826/485434a0/attachment.html>

David Orman

2008-Aug-26 16:00 UTC

head link

[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

After rebooting, I ran a zpool scrub on the root pool, to see if the issue
was resolved:

ormandj at neutron.corenode.com:~$ pfexec zpool status
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h11m with 1 errors on Tue Aug 26 10:58:01
2008
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     2
          mirror    ONLINE       0     0     2
            c6d0s0  ONLINE       0     0     4
            c7d0s0  ONLINE       0     0     4

errors: No known data errors
ormandj at neutron.corenode.com:~$ uptime
 10:59am  up   0:14,  1 user,  load average: 0.02, 0.05, 0.04
ormandj at neutron.corenode.com:~$ uname -a
SunOS neutron.corenode.com 5.11 snv_95 i86pc i386 i86pc Solaris
ormandj at neutron.corenode.com:~$

Obviously not. :( Any other suggestions?

Cheers,
David

On Tue, Aug 26, 2008 at 10:35 AM, David Orman <ormandj at corenode.com>
wrote:
> I''ve rebooted the system(s), which should accomplish this.
I''m not clear
> which posts you are referring to, I just joined the list today. The ZFS
pool
> is being mounted automatically, and that is the only filesystem on my
> system. I filed: http://defect.opensolaris.org/bz/show_bug.cgi?id=3079(bug
3079) as well:
>
> ormandj at neutron.corenode.com:~$ mount
> / on rpool/ROOT/opensolaris-4 read/write/setuid/devices/dev=2d90002 on Wed
> Dec 31 18:00:00 1969
> /devices on /devices read/write/setuid/devices/dev=4a00000 on Sat Aug 16
> 09:06:25 2008
> /dev on /dev read/write/setuid/devices/dev=4a40000 on Sat Aug 16 09:06:25
> 2008
> /system/contract on ctfs read/write/setuid/devices/dev=4ac0001 on Sat Aug
> 16 09:06:25 2008
> /proc on proc read/write/setuid/devices/dev=4b00000 on Sat Aug 16 09:06:25
> 2008
> /etc/mnttab on mnttab read/write/setuid/devices/dev=4b40001 on Sat Aug 16
> 09:06:25 2008
> /etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=4b80001 on
> Sat Aug 16 09:06:25 2008
> /system/object on objfs read/write/setuid/devices/dev=4bc0001 on Sat Aug 16
> 09:06:25 2008
> /etc/dfs/sharetab on sharefs read/write/setuid/devices/dev=4c00001 on Sat
> Aug 16 09:06:25 2008
> /lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1
> read/write/setuid/devices/dev=2d90002 on Sat Aug 16 09:06:35 2008
> /dev/fd on fd read/write/setuid/devices/dev=4d00001 on Sat Aug 16 09:06:36
> 2008
> /tmp on swap read/write/setuid/devices/xattr/dev=4b80002 on Sat Aug 16
> 09:06:36 2008
> /var/run on swap read/write/setuid/devices/xattr/dev=4b80003 on Sat Aug 16
> 09:06:36 2008
> /opt on rpool/ROOT/opensolaris-4/opt
> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90004 on Sat Aug
> 16 09:06:36 2008
> /export on rpool/export
> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90007 on Sat Aug
> 16 09:06:39 2008
> /export/home on rpool/export/home
> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90008 on Sat Aug
> 16 09:06:39 2008
> /rpool on rpool
> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90009 on Sat Aug
> 16 09:06:39 2008
> /rpool/ROOT on rpool/ROOT
> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d9000a on Sat Aug
> 16 09:06:39 2008
> ormandj at neutron.corenode.com:~$ df -h
> Filesystem            Size  Used Avail Use% Mounted on
> rpool/ROOT/opensolaris-4
>                       120G  5.9G  114G   5% /
> swap                  3.5G  304K  3.5G   1% /etc/svc/volatile
> /usr/lib/libc/libc_hwcap1.so.1
>                       120G  5.9G  114G   5% /lib/libc.so.1
> swap                  3.5G   40K  3.5G   1% /tmp
> swap                  3.5G   36K  3.5G   1% /var/run
> rpool/ROOT/opensolaris-4/opt
>                       115G  582M  114G   1% /opt
> rpool/export          114G   19K  114G   1% /export
> rpool/export/home     124G  9.3G  114G   8% /export/home
> rpool                 114G   61K  114G   1% /rpool
> rpool/ROOT            114G   18K  114G   1% /rpool/ROOT
> ormandj at neutron.corenode.com:~$
>
>
>
> On Tue, Aug 26, 2008 at 7:42 AM, Nils Goroll <slink at schokola.de>
wrote:
>
>> Hi David,
>>
>> have you tried mounting and re-mounting all filesystems which are not
>> being mounted automatically? See other posts to zfs-discuss.
>>
>> Nils
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080826/8a05ec2c/attachment.html>

David Orman

2008-Aug-26 17:37 UTC

head link

[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

After a helpful email from Miles, I destroyed all of my other opensolaris-*
filesystems (using beadm destroy), instead of his suggestion to
mount/unmount them all (easier this way.) I did another scrub:

ormandj at neutron.corenode.com:~$ pfexec zpool status
  pool: rpool
 state: ONLINE
 scrub: scrub completed after 0h11m with 1 errors on Tue Aug 26 12:34:06
2008
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c6d0s0  ONLINE       0     0     0
            c7d0s0  ONLINE       0     0     0

errors: No known data errors
ormandj at neutron.corenode.com:~$

I don''t see any errors (yet), but the scrub did complete - however it
says
with an error - however nothing was displayed as an error, so no clue what
it was. I''ll try this on the other servers, and see if the checksum
errors
start occuring again after usage.

Cheers,
David

PS - This is the thread Miles pointed me to:
http://www.opensolaris.org/jive/thread.jspa?threadID=70111&tstart=30

On Tue, Aug 26, 2008 at 11:00 AM, David Orman <ormandj at corenode.com>
wrote:
> After rebooting, I ran a zpool scrub on the root pool, to see if the issue
> was resolved:
>
> ormandj at neutron.corenode.com:~$ pfexec zpool status
>   pool: rpool
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are
> unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using ''zpool clear'' or replace the device with
''zpool replace''.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: scrub completed after 0h11m with 1 errors on Tue Aug 26 10:58:01
> 2008
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         rpool       ONLINE       0     0     2
>           mirror    ONLINE       0     0     2
>             c6d0s0  ONLINE       0     0     4
>             c7d0s0  ONLINE       0     0     4
>
> errors: No known data errors
> ormandj at neutron.corenode.com:~$ uptime
>  10:59am  up   0:14,  1 user,  load average: 0.02, 0.05, 0.04
> ormandj at neutron.corenode.com:~$ uname -a
> SunOS neutron.corenode.com 5.11 snv_95 i86pc i386 i86pc Solaris
> ormandj at neutron.corenode.com:~$
>
> Obviously not. :( Any other suggestions?
>
> Cheers,
> David
>
>
> On Tue, Aug 26, 2008 at 10:35 AM, David Orman <ormandj at
corenode.com>wrote:
>
>> I''ve rebooted the system(s), which should accomplish this.
I''m not clear
>> which posts you are referring to, I just joined the list today. The ZFS
pool
>> is being mounted automatically, and that is the only filesystem on my
>> system. I filed:
http://defect.opensolaris.org/bz/show_bug.cgi?id=3079(bug 3079) as well:
>>
>> ormandj at neutron.corenode.com:~$ mount
>> / on rpool/ROOT/opensolaris-4 read/write/setuid/devices/dev=2d90002 on
Wed
>> Dec 31 18:00:00 1969
>> /devices on /devices read/write/setuid/devices/dev=4a00000 on Sat Aug
16
>> 09:06:25 2008
>> /dev on /dev read/write/setuid/devices/dev=4a40000 on Sat Aug 16
09:06:25
>> 2008
>> /system/contract on ctfs read/write/setuid/devices/dev=4ac0001 on Sat
Aug
>> 16 09:06:25 2008
>> /proc on proc read/write/setuid/devices/dev=4b00000 on Sat Aug 16
09:06:25
>> 2008
>> /etc/mnttab on mnttab read/write/setuid/devices/dev=4b40001 on Sat Aug
16
>> 09:06:25 2008
>> /etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=4b80001
on
>> Sat Aug 16 09:06:25 2008
>> /system/object on objfs read/write/setuid/devices/dev=4bc0001 on Sat
Aug
>> 16 09:06:25 2008
>> /etc/dfs/sharetab on sharefs read/write/setuid/devices/dev=4c00001 on
Sat
>> Aug 16 09:06:25 2008
>> /lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1
>> read/write/setuid/devices/dev=2d90002 on Sat Aug 16 09:06:35 2008
>> /dev/fd on fd read/write/setuid/devices/dev=4d00001 on Sat Aug 16
09:06:36
>> 2008
>> /tmp on swap read/write/setuid/devices/xattr/dev=4b80002 on Sat Aug 16
>> 09:06:36 2008
>> /var/run on swap read/write/setuid/devices/xattr/dev=4b80003 on Sat Aug
16
>> 09:06:36 2008
>> /opt on rpool/ROOT/opensolaris-4/opt
>> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90004 on Sat
Aug
>> 16 09:06:36 2008
>> /export on rpool/export
>> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90007 on Sat
Aug
>> 16 09:06:39 2008
>> /export/home on rpool/export/home
>> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90008 on Sat
Aug
>> 16 09:06:39 2008
>> /rpool on rpool
>> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90009 on Sat
Aug
>> 16 09:06:39 2008
>> /rpool/ROOT on rpool/ROOT
>> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d9000a on Sat
Aug
>> 16 09:06:39 2008
>> ormandj at neutron.corenode.com:~$ df -h
>> Filesystem            Size  Used Avail Use% Mounted on
>> rpool/ROOT/opensolaris-4
>>                       120G  5.9G  114G   5% /
>> swap                  3.5G  304K  3.5G   1% /etc/svc/volatile
>> /usr/lib/libc/libc_hwcap1.so.1
>>                       120G  5.9G  114G   5% /lib/libc.so.1
>> swap                  3.5G   40K  3.5G   1% /tmp
>> swap                  3.5G   36K  3.5G   1% /var/run
>> rpool/ROOT/opensolaris-4/opt
>>                       115G  582M  114G   1% /opt
>> rpool/export          114G   19K  114G   1% /export
>> rpool/export/home     124G  9.3G  114G   8% /export/home
>> rpool                 114G   61K  114G   1% /rpool
>> rpool/ROOT            114G   18K  114G   1% /rpool/ROOT
>> ormandj at neutron.corenode.com:~$
>>
>>
>>
>> On Tue, Aug 26, 2008 at 7:42 AM, Nils Goroll <slink at
schokola.de> wrote:
>>
>>> Hi David,
>>>
>>> have you tried mounting and re-mounting all filesystems which are
not
>>> being mounted automatically? See other posts to zfs-discuss.
>>>
>>> Nils
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080826/6974ae84/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

zfs discuss - Aug 2008 - Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines

Maybe Matching Threads