David Orman
2008-Aug-26 10:54 UTC
[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
Hi,
After upgrading to b95 of OSOL/Indiana, and doing a ZFS upgrade to the newer
revision, all arrays I have using ZFS mirroring are displaying errors. This
started happening immediately after ZFS upgrades. Here is an example:
ormandj at neutron.corenode.com:~$ zpool status
pool: rpool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using ''zpool clear'' or replace the device with
''zpool replace''.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed after 0h12m with 6 errors on Wed Aug 20 06:26:33
2008
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 12
mirror DEGRADED 0 0 12
c6d0s0 DEGRADED 0 0 24 too many errors
c7d0s0 DEGRADED 0 0 24 too many errors
errors: No known data errors
ormandj at neutron.corenode.com:~$
ormandj at neutron.corenode.com:~$ zpool get all rpool
NAME PROPERTY VALUE SOURCE
rpool size 137G -
rpool used 21.0G -
rpool available 116G -
rpool capacity 15% -
rpool altroot - default
rpool health DEGRADED -
rpool guid 10397084409580638341 -
rpool version 11 default
rpool bootfs rpool/ROOT/opensolaris-4 local
rpool delegation on default
rpool autoreplace off default
rpool cachefile - default
rpool failmode wait default
ormandj at neutron.corenode.com:~$
ormandj at neutron.corenode.com:~$ pfexec prtvtoc /dev/rdsk/c6d0s0
* /dev/rdsk/c6d0s0 partition map
*
* Dimensions:
* 512 bytes/sector
* 63 sectors/track
* 255 tracks/cylinder
* 16065 sectors/cylinder
* 18240 cylinders
* 18238 accessible cylinders
*
* Flags:
* 1: unmountable
* 10: read-only
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 2 00 4209030 288784440 292993469
1 3 01 16065 4192965 4209029
2 5 01 0 292993470 292993469
8 1 01 0 16065 16064
ormandj at neutron.corenode.com:~$ pfexec prtvtoc /dev/rdsk/c7d0s0
* /dev/rdsk/c7d0s0 partition map
*
* Dimensions:
* 512 bytes/sector
* 63 sectors/track
* 255 tracks/cylinder
* 16065 sectors/cylinder
* 18240 cylinders
* 18238 accessible cylinders
*
* Flags:
* 1: unmountable
* 10: read-only
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 2 00 4209030 288784440 292993469
1 3 01 16065 4192965 4209029
2 5 01 0 292993470 292993469
8 1 01 0 16065 16064
ormandj at neutron.corenode.com:~$
These are root pools, and this is the method in which they were created:
#1 - Partition first drive during installation.
#2 - Once OSOL was installed: prtvtoc /dev/rdsk/c6d0s0 | fmthard -s -
/dev/rdsk/c7d0s0
#3 - zpool attach rpool c6d0s0 c7d0s0
#4 - installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c6d0s0
#5 - installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c7d0s0
Anybody have any ideas? I''d blame failing hardware, but this started
happening on 3 different machines, immediately following the versioin
upgrade (it prompted me to upgrade my pools and I did.) I thought there
might be a problem, so I detached the secondary drives on all systems, and
performed the steps above listed again. Still having errors. Please let me
know what other data would be useful for you, and I''ll provide it,
I''m
interested in getting this resolved as soon as possible.
Cheers,
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080826/ca09ca8f/attachment.html>
Nils Goroll
2008-Aug-26 12:42 UTC
[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
Hi David, have you tried mounting and re-mounting all filesystems which are not being mounted automatically? See other posts to zfs-discuss. Nils
Nils Goroll
2008-Aug-26 12:43 UTC
[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
glitch:> have you tried mounting and re-mounting all filesystems which are not^^^^^^^^^^^ unmounting
David Orman
2008-Aug-26 15:35 UTC
[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
I''ve rebooted the system(s), which should accomplish this. I''m
not clear
which posts you are referring to, I just joined the list today. The ZFS pool
is being mounted automatically, and that is the only filesystem on my
system. I filed: http://defect.opensolaris.org/bz/show_bug.cgi?id=3079 (bug
3079) as well:
ormandj at neutron.corenode.com:~$ mount
/ on rpool/ROOT/opensolaris-4 read/write/setuid/devices/dev=2d90002 on Wed
Dec 31 18:00:00 1969
/devices on /devices read/write/setuid/devices/dev=4a00000 on Sat Aug 16
09:06:25 2008
/dev on /dev read/write/setuid/devices/dev=4a40000 on Sat Aug 16 09:06:25
2008
/system/contract on ctfs read/write/setuid/devices/dev=4ac0001 on Sat Aug 16
09:06:25 2008
/proc on proc read/write/setuid/devices/dev=4b00000 on Sat Aug 16 09:06:25
2008
/etc/mnttab on mnttab read/write/setuid/devices/dev=4b40001 on Sat Aug 16
09:06:25 2008
/etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=4b80001 on Sat
Aug 16 09:06:25 2008
/system/object on objfs read/write/setuid/devices/dev=4bc0001 on Sat Aug 16
09:06:25 2008
/etc/dfs/sharetab on sharefs read/write/setuid/devices/dev=4c00001 on Sat
Aug 16 09:06:25 2008
/lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1
read/write/setuid/devices/dev=2d90002 on Sat Aug 16 09:06:35 2008
/dev/fd on fd read/write/setuid/devices/dev=4d00001 on Sat Aug 16 09:06:36
2008
/tmp on swap read/write/setuid/devices/xattr/dev=4b80002 on Sat Aug 16
09:06:36 2008
/var/run on swap read/write/setuid/devices/xattr/dev=4b80003 on Sat Aug 16
09:06:36 2008
/opt on rpool/ROOT/opensolaris-4/opt
read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90004 on Sat Aug
16 09:06:36 2008
/export on rpool/export
read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90007 on Sat Aug
16 09:06:39 2008
/export/home on rpool/export/home
read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90008 on Sat Aug
16 09:06:39 2008
/rpool on rpool
read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90009 on Sat Aug
16 09:06:39 2008
/rpool/ROOT on rpool/ROOT
read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d9000a on Sat Aug
16 09:06:39 2008
ormandj at neutron.corenode.com:~$ df -h
Filesystem Size Used Avail Use% Mounted on
rpool/ROOT/opensolaris-4
120G 5.9G 114G 5% /
swap 3.5G 304K 3.5G 1% /etc/svc/volatile
/usr/lib/libc/libc_hwcap1.so.1
120G 5.9G 114G 5% /lib/libc.so.1
swap 3.5G 40K 3.5G 1% /tmp
swap 3.5G 36K 3.5G 1% /var/run
rpool/ROOT/opensolaris-4/opt
115G 582M 114G 1% /opt
rpool/export 114G 19K 114G 1% /export
rpool/export/home 124G 9.3G 114G 8% /export/home
rpool 114G 61K 114G 1% /rpool
rpool/ROOT 114G 18K 114G 1% /rpool/ROOT
ormandj at neutron.corenode.com:~$
On Tue, Aug 26, 2008 at 7:42 AM, Nils Goroll <slink at schokola.de> wrote:
> Hi David,
>
> have you tried mounting and re-mounting all filesystems which are not being
> mounted automatically? See other posts to zfs-discuss.
>
> Nils
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080826/485434a0/attachment.html>
David Orman
2008-Aug-26 16:00 UTC
[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
After rebooting, I ran a zpool scrub on the root pool, to see if the issue
was resolved:
ormandj at neutron.corenode.com:~$ pfexec zpool status
pool: rpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using ''zpool clear'' or replace the device with
''zpool replace''.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed after 0h11m with 1 errors on Tue Aug 26 10:58:01
2008
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 2
mirror ONLINE 0 0 2
c6d0s0 ONLINE 0 0 4
c7d0s0 ONLINE 0 0 4
errors: No known data errors
ormandj at neutron.corenode.com:~$ uptime
10:59am up 0:14, 1 user, load average: 0.02, 0.05, 0.04
ormandj at neutron.corenode.com:~$ uname -a
SunOS neutron.corenode.com 5.11 snv_95 i86pc i386 i86pc Solaris
ormandj at neutron.corenode.com:~$
Obviously not. :( Any other suggestions?
Cheers,
David
On Tue, Aug 26, 2008 at 10:35 AM, David Orman <ormandj at corenode.com>
wrote:
> I''ve rebooted the system(s), which should accomplish this.
I''m not clear
> which posts you are referring to, I just joined the list today. The ZFS
pool
> is being mounted automatically, and that is the only filesystem on my
> system. I filed: http://defect.opensolaris.org/bz/show_bug.cgi?id=3079(bug
3079) as well:
>
> ormandj at neutron.corenode.com:~$ mount
> / on rpool/ROOT/opensolaris-4 read/write/setuid/devices/dev=2d90002 on Wed
> Dec 31 18:00:00 1969
> /devices on /devices read/write/setuid/devices/dev=4a00000 on Sat Aug 16
> 09:06:25 2008
> /dev on /dev read/write/setuid/devices/dev=4a40000 on Sat Aug 16 09:06:25
> 2008
> /system/contract on ctfs read/write/setuid/devices/dev=4ac0001 on Sat Aug
> 16 09:06:25 2008
> /proc on proc read/write/setuid/devices/dev=4b00000 on Sat Aug 16 09:06:25
> 2008
> /etc/mnttab on mnttab read/write/setuid/devices/dev=4b40001 on Sat Aug 16
> 09:06:25 2008
> /etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=4b80001 on
> Sat Aug 16 09:06:25 2008
> /system/object on objfs read/write/setuid/devices/dev=4bc0001 on Sat Aug 16
> 09:06:25 2008
> /etc/dfs/sharetab on sharefs read/write/setuid/devices/dev=4c00001 on Sat
> Aug 16 09:06:25 2008
> /lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1
> read/write/setuid/devices/dev=2d90002 on Sat Aug 16 09:06:35 2008
> /dev/fd on fd read/write/setuid/devices/dev=4d00001 on Sat Aug 16 09:06:36
> 2008
> /tmp on swap read/write/setuid/devices/xattr/dev=4b80002 on Sat Aug 16
> 09:06:36 2008
> /var/run on swap read/write/setuid/devices/xattr/dev=4b80003 on Sat Aug 16
> 09:06:36 2008
> /opt on rpool/ROOT/opensolaris-4/opt
> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90004 on Sat Aug
> 16 09:06:36 2008
> /export on rpool/export
> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90007 on Sat Aug
> 16 09:06:39 2008
> /export/home on rpool/export/home
> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90008 on Sat Aug
> 16 09:06:39 2008
> /rpool on rpool
> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90009 on Sat Aug
> 16 09:06:39 2008
> /rpool/ROOT on rpool/ROOT
> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d9000a on Sat Aug
> 16 09:06:39 2008
> ormandj at neutron.corenode.com:~$ df -h
> Filesystem Size Used Avail Use% Mounted on
> rpool/ROOT/opensolaris-4
> 120G 5.9G 114G 5% /
> swap 3.5G 304K 3.5G 1% /etc/svc/volatile
> /usr/lib/libc/libc_hwcap1.so.1
> 120G 5.9G 114G 5% /lib/libc.so.1
> swap 3.5G 40K 3.5G 1% /tmp
> swap 3.5G 36K 3.5G 1% /var/run
> rpool/ROOT/opensolaris-4/opt
> 115G 582M 114G 1% /opt
> rpool/export 114G 19K 114G 1% /export
> rpool/export/home 124G 9.3G 114G 8% /export/home
> rpool 114G 61K 114G 1% /rpool
> rpool/ROOT 114G 18K 114G 1% /rpool/ROOT
> ormandj at neutron.corenode.com:~$
>
>
>
> On Tue, Aug 26, 2008 at 7:42 AM, Nils Goroll <slink at schokola.de>
wrote:
>
>> Hi David,
>>
>> have you tried mounting and re-mounting all filesystems which are not
>> being mounted automatically? See other posts to zfs-discuss.
>>
>> Nils
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080826/8a05ec2c/attachment.html>
David Orman
2008-Aug-26 17:37 UTC
[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
After a helpful email from Miles, I destroyed all of my other opensolaris-*
filesystems (using beadm destroy), instead of his suggestion to
mount/unmount them all (easier this way.) I did another scrub:
ormandj at neutron.corenode.com:~$ pfexec zpool status
pool: rpool
state: ONLINE
scrub: scrub completed after 0h11m with 1 errors on Tue Aug 26 12:34:06
2008
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror ONLINE 0 0 0
c6d0s0 ONLINE 0 0 0
c7d0s0 ONLINE 0 0 0
errors: No known data errors
ormandj at neutron.corenode.com:~$
I don''t see any errors (yet), but the scrub did complete - however it
says
with an error - however nothing was displayed as an error, so no clue what
it was. I''ll try this on the other servers, and see if the checksum
errors
start occuring again after usage.
Cheers,
David
PS - This is the thread Miles pointed me to:
http://www.opensolaris.org/jive/thread.jspa?threadID=70111&tstart=30
On Tue, Aug 26, 2008 at 11:00 AM, David Orman <ormandj at corenode.com>
wrote:
> After rebooting, I ran a zpool scrub on the root pool, to see if the issue
> was resolved:
>
> ormandj at neutron.corenode.com:~$ pfexec zpool status
> pool: rpool
> state: ONLINE
> status: One or more devices has experienced an unrecoverable error. An
> attempt was made to correct the error. Applications are
> unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
> using ''zpool clear'' or replace the device with
''zpool replace''.
> see: http://www.sun.com/msg/ZFS-8000-9P
> scrub: scrub completed after 0h11m with 1 errors on Tue Aug 26 10:58:01
> 2008
> config:
>
> NAME STATE READ WRITE CKSUM
> rpool ONLINE 0 0 2
> mirror ONLINE 0 0 2
> c6d0s0 ONLINE 0 0 4
> c7d0s0 ONLINE 0 0 4
>
> errors: No known data errors
> ormandj at neutron.corenode.com:~$ uptime
> 10:59am up 0:14, 1 user, load average: 0.02, 0.05, 0.04
> ormandj at neutron.corenode.com:~$ uname -a
> SunOS neutron.corenode.com 5.11 snv_95 i86pc i386 i86pc Solaris
> ormandj at neutron.corenode.com:~$
>
> Obviously not. :( Any other suggestions?
>
> Cheers,
> David
>
>
> On Tue, Aug 26, 2008 at 10:35 AM, David Orman <ormandj at
corenode.com>wrote:
>
>> I''ve rebooted the system(s), which should accomplish this.
I''m not clear
>> which posts you are referring to, I just joined the list today. The ZFS
pool
>> is being mounted automatically, and that is the only filesystem on my
>> system. I filed:
http://defect.opensolaris.org/bz/show_bug.cgi?id=3079(bug 3079) as well:
>>
>> ormandj at neutron.corenode.com:~$ mount
>> / on rpool/ROOT/opensolaris-4 read/write/setuid/devices/dev=2d90002 on
Wed
>> Dec 31 18:00:00 1969
>> /devices on /devices read/write/setuid/devices/dev=4a00000 on Sat Aug
16
>> 09:06:25 2008
>> /dev on /dev read/write/setuid/devices/dev=4a40000 on Sat Aug 16
09:06:25
>> 2008
>> /system/contract on ctfs read/write/setuid/devices/dev=4ac0001 on Sat
Aug
>> 16 09:06:25 2008
>> /proc on proc read/write/setuid/devices/dev=4b00000 on Sat Aug 16
09:06:25
>> 2008
>> /etc/mnttab on mnttab read/write/setuid/devices/dev=4b40001 on Sat Aug
16
>> 09:06:25 2008
>> /etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=4b80001
on
>> Sat Aug 16 09:06:25 2008
>> /system/object on objfs read/write/setuid/devices/dev=4bc0001 on Sat
Aug
>> 16 09:06:25 2008
>> /etc/dfs/sharetab on sharefs read/write/setuid/devices/dev=4c00001 on
Sat
>> Aug 16 09:06:25 2008
>> /lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1
>> read/write/setuid/devices/dev=2d90002 on Sat Aug 16 09:06:35 2008
>> /dev/fd on fd read/write/setuid/devices/dev=4d00001 on Sat Aug 16
09:06:36
>> 2008
>> /tmp on swap read/write/setuid/devices/xattr/dev=4b80002 on Sat Aug 16
>> 09:06:36 2008
>> /var/run on swap read/write/setuid/devices/xattr/dev=4b80003 on Sat Aug
16
>> 09:06:36 2008
>> /opt on rpool/ROOT/opensolaris-4/opt
>> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90004 on Sat
Aug
>> 16 09:06:36 2008
>> /export on rpool/export
>> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90007 on Sat
Aug
>> 16 09:06:39 2008
>> /export/home on rpool/export/home
>> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90008 on Sat
Aug
>> 16 09:06:39 2008
>> /rpool on rpool
>> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90009 on Sat
Aug
>> 16 09:06:39 2008
>> /rpool/ROOT on rpool/ROOT
>> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d9000a on Sat
Aug
>> 16 09:06:39 2008
>> ormandj at neutron.corenode.com:~$ df -h
>> Filesystem Size Used Avail Use% Mounted on
>> rpool/ROOT/opensolaris-4
>> 120G 5.9G 114G 5% /
>> swap 3.5G 304K 3.5G 1% /etc/svc/volatile
>> /usr/lib/libc/libc_hwcap1.so.1
>> 120G 5.9G 114G 5% /lib/libc.so.1
>> swap 3.5G 40K 3.5G 1% /tmp
>> swap 3.5G 36K 3.5G 1% /var/run
>> rpool/ROOT/opensolaris-4/opt
>> 115G 582M 114G 1% /opt
>> rpool/export 114G 19K 114G 1% /export
>> rpool/export/home 124G 9.3G 114G 8% /export/home
>> rpool 114G 61K 114G 1% /rpool
>> rpool/ROOT 114G 18K 114G 1% /rpool/ROOT
>> ormandj at neutron.corenode.com:~$
>>
>>
>>
>> On Tue, Aug 26, 2008 at 7:42 AM, Nils Goroll <slink at
schokola.de> wrote:
>>
>>> Hi David,
>>>
>>> have you tried mounting and re-mounting all filesystems which are
not
>>> being mounted automatically? See other posts to zfs-discuss.
>>>
>>> Nils
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080826/6974ae84/attachment.html>