thr3ads.net - zfs discuss - [zfs-discuss] Checksum errors... [Jan 2007]

If this information is useful, please help other people find it:
Share via:

eric kustarz

2007-Jan-04 19:21 UTC

[zfs-discuss] Checksum errors...

> errors: The following persistent errors have been detected:
> 
>           DATASET                      OBJECT  RANGE
>           z_tsmsun1_pool/tsmsrv1_pool  2620    8464760832-8464891904
> 
> Looks like I have possibly a single file that is corrupted.  My question is
how do I find the file.  Is it as simple as doing a find command using
"-inum 2620"?
> 
FYI, i''m finishing up:
6410433 ''zpool status -v'' would be more useful with filenames

Which will give you the complete path to the file (if applicable), so 
you don''t have to do a ''find'' on the inum.

eric

Robert Milkowski

2007-Jan-10 16:26 UTC

head link

[zfs-discuss] Re: Checksum errors...

Hello John,

Thursday, December 28, 2006, 12:59:34 PM, you wrote:

J> Ok... guess I answered my own question... LOL!

J> I did the find with the -inum... gave me a file name... so i did:

J> tsmsun1 - /tsmsrv1_pool >dd if=000203db.bfs of=/dev/null bs=128k
J> read: I/O error
J> 64581+0 records in
J> 64581+0 records out

J> So.... it would appear the file is poo-poo....

J> Now the interesting thoughts... These 3511''s have been around for
J> a couple of years.  We were using them with Veritas VxFS...  We
J> only recently switched over to ZFS to take advantage of
J> compression.... So is it safe to say that I was lucky using VxFS
J> and never had any corruption or was I suffering from silent
J> corruption under VxFS.... hmmm.....

I guess you got silent corruption before.
With 3511 with sata driver I also get checksum errors from time to
time, while on 3510 with FC drives I haven''t seen them (yet).
And it explains why we had to run fsck every few months before...

--
Best regards,
 Robert                            mailto:rmilkowski@task.gda.pl
                                       http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

John

2007-Jan-10 16:26 UTC

head link

[zfs-discuss] Re: Checksum errors...

Ok... guess I answered my own question... LOL!

I did the find with the -inum... gave me a file name... so i did:

tsmsun1 - /tsmsrv1_pool >dd if=000203db.bfs of=/dev/null bs=128k
read: I/O error
64581+0 records in
64581+0 records out

So.... it would appear the file is poo-poo....

Now the interesting thoughts... These 3511''s have been around for a
couple of years.  We were using them with Veritas VxFS...  We only recently
switched over to ZFS to take advantage of compression.... So is it safe to say
that I was lucky using VxFS and never had any corruption or was I suffering from
silent corruption under VxFS.... hmmm.....

thanks!
john


This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

John

2007-Jan-10 16:26 UTC

head link

[zfs-discuss] Re: Re: Checksum errors...

Thanks for the reply!

As it turns out I ran a parity check on the suspect 3511... sure enough it
popped and error!  So ZFS did detect the problem with the 3511...


This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

John

2007-Jan-10 16:26 UTC

head link

[zfs-discuss] Checksum errors...

Background:
Large ZFS pool built on a couple of Sun 3511 SATA arrays. RAID-5 is done in the
3511s. ZFS is non-redundant. We have been using this setup for a couple of
months now with no issues.

Problem:
Yesterday afternoon we started getting checksum errors.  There have been no
hardware errors reported at either the Solaris level or the hardware level. 
3511 logs are clean. Here is the zpool status:

tsmsun1 - /home/root >zpool status -xv
  pool: z_tsmsun1_pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME                                        STATE     READ WRITE CKSUM
        z_tsmsun1_pool                              ONLINE       0     0   180
          c22t600C0FF00000000000678A0A86F3D901d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000678A0A86F3D900d0s0  ONLINE       0     0     0
          c22t600C0FF0000000000068190A86F3D901d0s0  ONLINE       0     0     0
          c22t600C0FF0000000000068190A86F3D900d0s0  ONLINE       0     0     0
          c22t600C0FF0000000000068191A598ED500d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000678A1A598ED500d0s0  ONLINE       0     0     0
          c22t600C0FF0000000000068191A598ED501d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000681943A7223100d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000681943A7223101d0    ONLINE       0     0     0
          c22t600C0FF00000000000681932BBD24400d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000681932BBD24401d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000678A43A7223100d0s0  ONLINE       0     0   180
          c22t600C0FF00000000000678A2055211B01d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000678A2055211B00d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000678A32BBD24401d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000678A1A598ED501d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000678A32BBD24400d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000678A43A7223101d0s0  ONLINE       0     0     0
          c22t600C0FF0000000000068192055211B00d0s0  ONLINE       0     0     0
          c22t600C0FF0000000000068192055211B01d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000678A44F3D81B00d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000678A44F3D81B01d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000681944F3D81B00d0s0  ONLINE       0     0     0
          c22t600C0FF00000000000681944F3D81B01d0s0  ONLINE       0     0     0

errors: The following persistent errors have been detected:

          DATASET                      OBJECT  RANGE
          z_tsmsun1_pool/tsmsrv1_pool  2620    8464760832-8464891904

Looks like I have possibly a single file that is corrupted.  My question is how
do I find the file.  Is it as simple as doing a find command using "-inum
2620"?

TIA,
john


This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Jan 2007 - Checksum errors...

[zfs-discuss] Checksum errors...

[zfs-discuss] Re: Checksum errors...

[zfs-discuss] Re: Checksum errors...

[zfs-discuss] Re: Re: Checksum errors...

[zfs-discuss] Checksum errors...