David Orman
2008-Aug-26 10:54 UTC
[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
Hi, After upgrading to b95 of OSOL/Indiana, and doing a ZFS upgrade to the newer revision, all arrays I have using ZFS mirroring are displaying errors. This started happening immediately after ZFS upgrades. Here is an example: ormandj at neutron.corenode.com:~$ zpool status pool: rpool state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h12m with 6 errors on Wed Aug 20 06:26:33 2008 config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 12 mirror DEGRADED 0 0 12 c6d0s0 DEGRADED 0 0 24 too many errors c7d0s0 DEGRADED 0 0 24 too many errors errors: No known data errors ormandj at neutron.corenode.com:~$ ormandj at neutron.corenode.com:~$ zpool get all rpool NAME PROPERTY VALUE SOURCE rpool size 137G - rpool used 21.0G - rpool available 116G - rpool capacity 15% - rpool altroot - default rpool health DEGRADED - rpool guid 10397084409580638341 - rpool version 11 default rpool bootfs rpool/ROOT/opensolaris-4 local rpool delegation on default rpool autoreplace off default rpool cachefile - default rpool failmode wait default ormandj at neutron.corenode.com:~$ ormandj at neutron.corenode.com:~$ pfexec prtvtoc /dev/rdsk/c6d0s0 * /dev/rdsk/c6d0s0 partition map * * Dimensions: * 512 bytes/sector * 63 sectors/track * 255 tracks/cylinder * 16065 sectors/cylinder * 18240 cylinders * 18238 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 2 00 4209030 288784440 292993469 1 3 01 16065 4192965 4209029 2 5 01 0 292993470 292993469 8 1 01 0 16065 16064 ormandj at neutron.corenode.com:~$ pfexec prtvtoc /dev/rdsk/c7d0s0 * /dev/rdsk/c7d0s0 partition map * * Dimensions: * 512 bytes/sector * 63 sectors/track * 255 tracks/cylinder * 16065 sectors/cylinder * 18240 cylinders * 18238 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 2 00 4209030 288784440 292993469 1 3 01 16065 4192965 4209029 2 5 01 0 292993470 292993469 8 1 01 0 16065 16064 ormandj at neutron.corenode.com:~$ These are root pools, and this is the method in which they were created: #1 - Partition first drive during installation. #2 - Once OSOL was installed: prtvtoc /dev/rdsk/c6d0s0 | fmthard -s - /dev/rdsk/c7d0s0 #3 - zpool attach rpool c6d0s0 c7d0s0 #4 - installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c6d0s0 #5 - installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c7d0s0 Anybody have any ideas? I''d blame failing hardware, but this started happening on 3 different machines, immediately following the versioin upgrade (it prompted me to upgrade my pools and I did.) I thought there might be a problem, so I detached the secondary drives on all systems, and performed the steps above listed again. Still having errors. Please let me know what other data would be useful for you, and I''ll provide it, I''m interested in getting this resolved as soon as possible. Cheers, David -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080826/ca09ca8f/attachment.html>
Nils Goroll
2008-Aug-26 12:42 UTC
[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
Hi David, have you tried mounting and re-mounting all filesystems which are not being mounted automatically? See other posts to zfs-discuss. Nils
Nils Goroll
2008-Aug-26 12:43 UTC
[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
glitch:> have you tried mounting and re-mounting all filesystems which are not^^^^^^^^^^^ unmounting
David Orman
2008-Aug-26 15:35 UTC
[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
I''ve rebooted the system(s), which should accomplish this. I''m not clear which posts you are referring to, I just joined the list today. The ZFS pool is being mounted automatically, and that is the only filesystem on my system. I filed: http://defect.opensolaris.org/bz/show_bug.cgi?id=3079 (bug 3079) as well: ormandj at neutron.corenode.com:~$ mount / on rpool/ROOT/opensolaris-4 read/write/setuid/devices/dev=2d90002 on Wed Dec 31 18:00:00 1969 /devices on /devices read/write/setuid/devices/dev=4a00000 on Sat Aug 16 09:06:25 2008 /dev on /dev read/write/setuid/devices/dev=4a40000 on Sat Aug 16 09:06:25 2008 /system/contract on ctfs read/write/setuid/devices/dev=4ac0001 on Sat Aug 16 09:06:25 2008 /proc on proc read/write/setuid/devices/dev=4b00000 on Sat Aug 16 09:06:25 2008 /etc/mnttab on mnttab read/write/setuid/devices/dev=4b40001 on Sat Aug 16 09:06:25 2008 /etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=4b80001 on Sat Aug 16 09:06:25 2008 /system/object on objfs read/write/setuid/devices/dev=4bc0001 on Sat Aug 16 09:06:25 2008 /etc/dfs/sharetab on sharefs read/write/setuid/devices/dev=4c00001 on Sat Aug 16 09:06:25 2008 /lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1 read/write/setuid/devices/dev=2d90002 on Sat Aug 16 09:06:35 2008 /dev/fd on fd read/write/setuid/devices/dev=4d00001 on Sat Aug 16 09:06:36 2008 /tmp on swap read/write/setuid/devices/xattr/dev=4b80002 on Sat Aug 16 09:06:36 2008 /var/run on swap read/write/setuid/devices/xattr/dev=4b80003 on Sat Aug 16 09:06:36 2008 /opt on rpool/ROOT/opensolaris-4/opt read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90004 on Sat Aug 16 09:06:36 2008 /export on rpool/export read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90007 on Sat Aug 16 09:06:39 2008 /export/home on rpool/export/home read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90008 on Sat Aug 16 09:06:39 2008 /rpool on rpool read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90009 on Sat Aug 16 09:06:39 2008 /rpool/ROOT on rpool/ROOT read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d9000a on Sat Aug 16 09:06:39 2008 ormandj at neutron.corenode.com:~$ df -h Filesystem Size Used Avail Use% Mounted on rpool/ROOT/opensolaris-4 120G 5.9G 114G 5% / swap 3.5G 304K 3.5G 1% /etc/svc/volatile /usr/lib/libc/libc_hwcap1.so.1 120G 5.9G 114G 5% /lib/libc.so.1 swap 3.5G 40K 3.5G 1% /tmp swap 3.5G 36K 3.5G 1% /var/run rpool/ROOT/opensolaris-4/opt 115G 582M 114G 1% /opt rpool/export 114G 19K 114G 1% /export rpool/export/home 124G 9.3G 114G 8% /export/home rpool 114G 61K 114G 1% /rpool rpool/ROOT 114G 18K 114G 1% /rpool/ROOT ormandj at neutron.corenode.com:~$ On Tue, Aug 26, 2008 at 7:42 AM, Nils Goroll <slink at schokola.de> wrote:> Hi David, > > have you tried mounting and re-mounting all filesystems which are not being > mounted automatically? See other posts to zfs-discuss. > > Nils > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080826/485434a0/attachment.html>
David Orman
2008-Aug-26 16:00 UTC
[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
After rebooting, I ran a zpool scrub on the root pool, to see if the issue was resolved: ormandj at neutron.corenode.com:~$ pfexec zpool status pool: rpool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h11m with 1 errors on Tue Aug 26 10:58:01 2008 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 2 mirror ONLINE 0 0 2 c6d0s0 ONLINE 0 0 4 c7d0s0 ONLINE 0 0 4 errors: No known data errors ormandj at neutron.corenode.com:~$ uptime 10:59am up 0:14, 1 user, load average: 0.02, 0.05, 0.04 ormandj at neutron.corenode.com:~$ uname -a SunOS neutron.corenode.com 5.11 snv_95 i86pc i386 i86pc Solaris ormandj at neutron.corenode.com:~$ Obviously not. :( Any other suggestions? Cheers, David On Tue, Aug 26, 2008 at 10:35 AM, David Orman <ormandj at corenode.com> wrote:> I''ve rebooted the system(s), which should accomplish this. I''m not clear > which posts you are referring to, I just joined the list today. The ZFS pool > is being mounted automatically, and that is the only filesystem on my > system. I filed: http://defect.opensolaris.org/bz/show_bug.cgi?id=3079(bug 3079) as well: > > ormandj at neutron.corenode.com:~$ mount > / on rpool/ROOT/opensolaris-4 read/write/setuid/devices/dev=2d90002 on Wed > Dec 31 18:00:00 1969 > /devices on /devices read/write/setuid/devices/dev=4a00000 on Sat Aug 16 > 09:06:25 2008 > /dev on /dev read/write/setuid/devices/dev=4a40000 on Sat Aug 16 09:06:25 > 2008 > /system/contract on ctfs read/write/setuid/devices/dev=4ac0001 on Sat Aug > 16 09:06:25 2008 > /proc on proc read/write/setuid/devices/dev=4b00000 on Sat Aug 16 09:06:25 > 2008 > /etc/mnttab on mnttab read/write/setuid/devices/dev=4b40001 on Sat Aug 16 > 09:06:25 2008 > /etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=4b80001 on > Sat Aug 16 09:06:25 2008 > /system/object on objfs read/write/setuid/devices/dev=4bc0001 on Sat Aug 16 > 09:06:25 2008 > /etc/dfs/sharetab on sharefs read/write/setuid/devices/dev=4c00001 on Sat > Aug 16 09:06:25 2008 > /lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1 > read/write/setuid/devices/dev=2d90002 on Sat Aug 16 09:06:35 2008 > /dev/fd on fd read/write/setuid/devices/dev=4d00001 on Sat Aug 16 09:06:36 > 2008 > /tmp on swap read/write/setuid/devices/xattr/dev=4b80002 on Sat Aug 16 > 09:06:36 2008 > /var/run on swap read/write/setuid/devices/xattr/dev=4b80003 on Sat Aug 16 > 09:06:36 2008 > /opt on rpool/ROOT/opensolaris-4/opt > read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90004 on Sat Aug > 16 09:06:36 2008 > /export on rpool/export > read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90007 on Sat Aug > 16 09:06:39 2008 > /export/home on rpool/export/home > read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90008 on Sat Aug > 16 09:06:39 2008 > /rpool on rpool > read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90009 on Sat Aug > 16 09:06:39 2008 > /rpool/ROOT on rpool/ROOT > read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d9000a on Sat Aug > 16 09:06:39 2008 > ormandj at neutron.corenode.com:~$ df -h > Filesystem Size Used Avail Use% Mounted on > rpool/ROOT/opensolaris-4 > 120G 5.9G 114G 5% / > swap 3.5G 304K 3.5G 1% /etc/svc/volatile > /usr/lib/libc/libc_hwcap1.so.1 > 120G 5.9G 114G 5% /lib/libc.so.1 > swap 3.5G 40K 3.5G 1% /tmp > swap 3.5G 36K 3.5G 1% /var/run > rpool/ROOT/opensolaris-4/opt > 115G 582M 114G 1% /opt > rpool/export 114G 19K 114G 1% /export > rpool/export/home 124G 9.3G 114G 8% /export/home > rpool 114G 61K 114G 1% /rpool > rpool/ROOT 114G 18K 114G 1% /rpool/ROOT > ormandj at neutron.corenode.com:~$ > > > > On Tue, Aug 26, 2008 at 7:42 AM, Nils Goroll <slink at schokola.de> wrote: > >> Hi David, >> >> have you tried mounting and re-mounting all filesystems which are not >> being mounted automatically? See other posts to zfs-discuss. >> >> Nils >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080826/8a05ec2c/attachment.html>
David Orman
2008-Aug-26 17:37 UTC
[zfs-discuss] Problem w/ b95 + ZFS (version 11) - seeing fair number of errors on multiple machines
After a helpful email from Miles, I destroyed all of my other opensolaris-* filesystems (using beadm destroy), instead of his suggestion to mount/unmount them all (easier this way.) I did another scrub: ormandj at neutron.corenode.com:~$ pfexec zpool status pool: rpool state: ONLINE scrub: scrub completed after 0h11m with 1 errors on Tue Aug 26 12:34:06 2008 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c6d0s0 ONLINE 0 0 0 c7d0s0 ONLINE 0 0 0 errors: No known data errors ormandj at neutron.corenode.com:~$ I don''t see any errors (yet), but the scrub did complete - however it says with an error - however nothing was displayed as an error, so no clue what it was. I''ll try this on the other servers, and see if the checksum errors start occuring again after usage. Cheers, David PS - This is the thread Miles pointed me to: http://www.opensolaris.org/jive/thread.jspa?threadID=70111&tstart=30 On Tue, Aug 26, 2008 at 11:00 AM, David Orman <ormandj at corenode.com> wrote:> After rebooting, I ran a zpool scrub on the root pool, to see if the issue > was resolved: > > ormandj at neutron.corenode.com:~$ pfexec zpool status > pool: rpool > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are > unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using ''zpool clear'' or replace the device with ''zpool replace''. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: scrub completed after 0h11m with 1 errors on Tue Aug 26 10:58:01 > 2008 > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 2 > mirror ONLINE 0 0 2 > c6d0s0 ONLINE 0 0 4 > c7d0s0 ONLINE 0 0 4 > > errors: No known data errors > ormandj at neutron.corenode.com:~$ uptime > 10:59am up 0:14, 1 user, load average: 0.02, 0.05, 0.04 > ormandj at neutron.corenode.com:~$ uname -a > SunOS neutron.corenode.com 5.11 snv_95 i86pc i386 i86pc Solaris > ormandj at neutron.corenode.com:~$ > > Obviously not. :( Any other suggestions? > > Cheers, > David > > > On Tue, Aug 26, 2008 at 10:35 AM, David Orman <ormandj at corenode.com>wrote: > >> I''ve rebooted the system(s), which should accomplish this. I''m not clear >> which posts you are referring to, I just joined the list today. The ZFS pool >> is being mounted automatically, and that is the only filesystem on my >> system. I filed: http://defect.opensolaris.org/bz/show_bug.cgi?id=3079(bug 3079) as well: >> >> ormandj at neutron.corenode.com:~$ mount >> / on rpool/ROOT/opensolaris-4 read/write/setuid/devices/dev=2d90002 on Wed >> Dec 31 18:00:00 1969 >> /devices on /devices read/write/setuid/devices/dev=4a00000 on Sat Aug 16 >> 09:06:25 2008 >> /dev on /dev read/write/setuid/devices/dev=4a40000 on Sat Aug 16 09:06:25 >> 2008 >> /system/contract on ctfs read/write/setuid/devices/dev=4ac0001 on Sat Aug >> 16 09:06:25 2008 >> /proc on proc read/write/setuid/devices/dev=4b00000 on Sat Aug 16 09:06:25 >> 2008 >> /etc/mnttab on mnttab read/write/setuid/devices/dev=4b40001 on Sat Aug 16 >> 09:06:25 2008 >> /etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=4b80001 on >> Sat Aug 16 09:06:25 2008 >> /system/object on objfs read/write/setuid/devices/dev=4bc0001 on Sat Aug >> 16 09:06:25 2008 >> /etc/dfs/sharetab on sharefs read/write/setuid/devices/dev=4c00001 on Sat >> Aug 16 09:06:25 2008 >> /lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1 >> read/write/setuid/devices/dev=2d90002 on Sat Aug 16 09:06:35 2008 >> /dev/fd on fd read/write/setuid/devices/dev=4d00001 on Sat Aug 16 09:06:36 >> 2008 >> /tmp on swap read/write/setuid/devices/xattr/dev=4b80002 on Sat Aug 16 >> 09:06:36 2008 >> /var/run on swap read/write/setuid/devices/xattr/dev=4b80003 on Sat Aug 16 >> 09:06:36 2008 >> /opt on rpool/ROOT/opensolaris-4/opt >> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90004 on Sat Aug >> 16 09:06:36 2008 >> /export on rpool/export >> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90007 on Sat Aug >> 16 09:06:39 2008 >> /export/home on rpool/export/home >> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90008 on Sat Aug >> 16 09:06:39 2008 >> /rpool on rpool >> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90009 on Sat Aug >> 16 09:06:39 2008 >> /rpool/ROOT on rpool/ROOT >> read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d9000a on Sat Aug >> 16 09:06:39 2008 >> ormandj at neutron.corenode.com:~$ df -h >> Filesystem Size Used Avail Use% Mounted on >> rpool/ROOT/opensolaris-4 >> 120G 5.9G 114G 5% / >> swap 3.5G 304K 3.5G 1% /etc/svc/volatile >> /usr/lib/libc/libc_hwcap1.so.1 >> 120G 5.9G 114G 5% /lib/libc.so.1 >> swap 3.5G 40K 3.5G 1% /tmp >> swap 3.5G 36K 3.5G 1% /var/run >> rpool/ROOT/opensolaris-4/opt >> 115G 582M 114G 1% /opt >> rpool/export 114G 19K 114G 1% /export >> rpool/export/home 124G 9.3G 114G 8% /export/home >> rpool 114G 61K 114G 1% /rpool >> rpool/ROOT 114G 18K 114G 1% /rpool/ROOT >> ormandj at neutron.corenode.com:~$ >> >> >> >> On Tue, Aug 26, 2008 at 7:42 AM, Nils Goroll <slink at schokola.de> wrote: >> >>> Hi David, >>> >>> have you tried mounting and re-mounting all filesystems which are not >>> being mounted automatically? See other posts to zfs-discuss. >>> >>> Nils >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080826/6974ae84/attachment.html>