Ciaran Cummins
2011-Jul-12 12:04 UTC
[zfs-discuss] Can zpool permanent errors fixed by scrub?
Hi, we had a server that lost connection to fiber attached disk array where data luns were housed, due to 3510 power fault. After connection restored alot of the zpool status had these permanent errors listed as per below. I check the files in question and as far as I could see they were present and ok. I ran a zpool scrub against other zpools and they came back with no errors and the list of permanent errors has cleared. Does this mean the files were ok and zfs just got a bit confused with the dissappearing and re-appearing lun''s? or does the scrub just reset the list of errors, and there could still be data loss? This is the last one in progress, it will take a while to finish... I also check the contents of some logs and text files that were also mentioned previously and they looked ok before the scrub. Do I need to chksum the orig and backed up files? root@# zpool status -xv pool: APP1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress for 0h7m, 3.33% done, 3h32m to go config: NAME STATE READ WRITE CKSUM APP1 ONLINE 0 0 0 c4t600C0FF0000000000B1F6138A0BD8801d0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /zones/cctsprod/root/ccts/grid_agent/app/agent10g/network/mesg/tnsus.msb /zones/cctsprod/root/ccts/grid_agent/app/agent10g/rdbms/mesg/oraus.msb /zones/cctsprod/root/ccts/grid_agent/app/agent10g/perl/lib/5.8.3/sun4-solaris-thread-multi/auto/Socket/Socket.so /zones/cctsprod/root/ccts/grid_agent/app/agent10g/lib32/libnmefsql.so /zones/cctsprod/root/ccts/grid_agent/app/agent10g/network/log/sqlnet.log /zones/cctsprod/root/ccts/grid_agent/app/agent10g/sysman/log/emagent.trc /zones/cctsprod/root/ccts/grid_agent/app/agent10g/sysman/log/emdctl.trc /zones/cctsprod/root/ccts/grid_agent/app/agent10g/sysman/log/emagent.nohup /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libons.so /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libskgxn2.so /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libocr10.so /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libocrb10.so /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libocrutl10.so /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libhasgen10.so /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libclsra10.so /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/rdbms/mesg/oraus.msb /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libnnz10.so /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/dbs/hc_ccts.dat /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/bin/tnslsnr /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libclntsh.so.10.1 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib32/libclntsh.so.10.1 root at audcourtdb1 # ls -la /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib32/libclntsh.so.10.1 -rwxr-xr-x 1 oracle oinstall 22033380 Jun 25 12:00 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib32/libclntsh.so.10.1 root at audcourtdb1 # vi list.files "list.files" [New file] [I put the list of files mentioned above in here] "list.files" [New file] 22 lines, 1696 characters root at audcourtdb1 # root at audcourtdb1 # for i in `cat list.files` root at audcourtdb1 > do root at audcourtdb1 > ls -la $i root at audcourtdb1 > done -rw-r--r-- 1 oracle oinstall 47104 Feb 27 2008 /zones/cctsprod/root/ccts/grid_agent/app/agent10g/network/mesg/tnsus.msb -rw-r--r-- 1 oracle oinstall 849408 Apr 4 2008 /zones/cctsprod/root/ccts/grid_agent/app/agent10g/rdbms/mesg/oraus.msb -rwxr-xr-x 1 oracle oinstall 41304 Jul 7 2004 /zones/cctsprod/root/ccts/grid_agent/app/agent10g/perl/lib/5.8.3/sun4-solaris-thread-multi/auto/Socket/Socket.so -rwxr-x--- 1 oracle oinstall 71368 Aug 6 2009 /zones/cctsprod/root/ccts/grid_agent/app/agent10g/lib32/libnmefsql.so -rw-r----- 1 oracle oinstall 343664 Feb 22 18:19 /zones/cctsprod/root/ccts/grid_agent/app/agent10g/network/log/sqlnet.log -rw-r--r-- 1 oracle oinstall 140643 Jul 12 06:51 /zones/cctsprod/root/ccts/grid_agent/app/agent10g/sysman/log/emagent.trc -rw-r----- 1 oracle oinstall 7125 Jun 25 14:03 /zones/cctsprod/root/ccts/grid_agent/app/agent10g/sysman/log/emdctl.trc -rw-r--r-- 1 oracle oinstall 2357 Jun 25 14:04 /zones/cctsprod/root/ccts/grid_agent/app/agent10g/sysman/log/emagent.nohup -rw-r--r-- 1 oracle oinstall 111544 Feb 21 2008 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libons.so -rwxr-xr-x 1 oracle oinstall 11216 Feb 20 2006 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libskgxn2.so -rw-r--r-- 1 oracle oinstall 3415512 Mar 15 2008 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libocr10.so -rw-r--r-- 1 oracle oinstall 3342832 Mar 15 2008 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libocrb10.so -rw-r--r-- 1 oracle oinstall 2847912 Mar 15 2008 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libocrutl10.so -rw-r--r-- 1 oracle oinstall 3816088 Mar 15 2008 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libhasgen10.so -rw-r--r-- 1 oracle oinstall 2834584 Mar 15 2008 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libclsra10.so -rw-r--r-- 1 oracle oinstall 849408 Apr 4 2008 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/rdbms/mesg/oraus.msb -rwxr-xr-x 1 oracle oinstall 7613960 Apr 13 09:58 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libnnz10.so -rw-rw---- 1 oracle oinstall 1552 Jul 12 12:51 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/dbs/hc_ccts.dat -rwxr-x--x 1 oracle oinstall 611440 Nov 6 2010 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/bin/tnslsnr -rwxr-xr-x 1 oracle oinstall 25540064 Jun 25 12:00 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib/libclntsh.so.10.1 -rwxr-xr-x 1 oracle oinstall 22033380 Jun 25 12:00 /zones/cctsprod/root/ccts/oracle/app/product/10.2.0/lib32/libclntsh.so.10.1 root at audcourtdb1 # any advice appreciated, Ciaran -- This message posted from opensolaris.org
Ian Collins
2011-Jul-12 21:14 UTC
[zfs-discuss] Can zpool permanent errors fixed by scrub?
On 07/13/11 12:04 AM, Ciaran Cummins wrote:> Hi, we had a server that lost connection to fiber attached disk array where data luns were housed, due to 3510 power fault. After connection restored alot of the zpool status had these permanent errors listed as per below. I check the files in question and as far as I could see they were present and ok. I ran a zpool scrub against other zpools and they came back with no errors and the list of permanent errors has cleared. Does this mean the files were ok and zfs just got a bit confused with the dissappearing and re-appearing lun''s? or does the scrub just reset the list of errors, and there could still be data loss?I had then same issue when an x4500 brain-farted. I was going to restore after the scrub but like you system, the errors cleared. Al the files appeared OK, but I would also like to know if this is expected behaviour. Have you tried running a checksum on some of the files to compare with a backup? If the file is really corrupt, running digest (or anything that reads the whole file) will fail with an I/O error. -- Ian.